Shared memory access violation does not (always) trigger compute-sanitizer

The text I had in mind from that link is this:

Note that the maximum amount of shared memory per thread block is smaller than the maximum shared memory partition available per SM. The 1 KB of shared memory not made available to a thread block is reserved for system use.

That’s an indication that there is 1KB of shared memory that is allocated for system use. By observation, this affects the compute-sanitizer behavior for “small” out-of-bounds access. I haven’t studied your case carefully, I was just pointing out that there is an allocation done by the system and by observation it seems it can affect things.

You can file a bug if you’d like to see a change in CUDA behavior, but see my notes below.

By “by observation” I mean that if I take your code and keep the out-of-bounds extent to less than 1024 bytes, then as you indicate I don’t witness any reports. If I make the out-of-bounds extent to be at around 1024 bytes or larger, I get error reports. I don’t have any further information, its just an observation.

Here is an example of what looks to me like a similar report. As indicated there, starting with hopper and moving forward, it seems the issue has been addressed.

It seems that the indication there lines up with your reporting. The L4/A10/A100 GPUs are pre-Hopper, the RTX 5080, being blackwell generation, is post-Hopper.