splendid.
If you want to file a bug, I think you’ll be better off. I doubt I would spend much time with it, because although I can navigate PTX, its much less familiar to me compared to CUDA C++. And given that it seems to work or not based on ptxas optimization level, that certainly is a possible indicator (not a guarantee) of a toolchain problem.