After some refactoring of my OptiX 7.0 program, i keep getting the error: “an illegal memory access was encountered” when calling optixLaunch.
I have tried creating a pipeline with an empty raygen program, and the error persits, so i know it is not some error in my CUDA code.
I have spent quite some time already to try and debug this, but i just cant seem to make the error go away.
Running cuda-memcheck gives the following error:
========= Host API memory access error at host access to 0x7f6885e09cf8 of size 1164 bytes ========= Invalid range on access by cudaMemcopy source. ========= Saved host backtrace up to driver entry point at error ========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoDAsync_v2 + 0x219) [0x293d59] ========= Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.26 [0x14b2bb] ========= Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.26 [0x14bd0d] ========= Host Frame:/usr/lib/x86_64-linux-gnu/libnvoptix.so.1 [0x167998] ========= Host Frame:/usr/lib/x86_64-linux-gnu/libnvoptix.so.1 [0x72c875] ========= Host Frame:godray.8447d32d/godray [0x13f00] ========= Host Frame:godray.8447d32d/godray [0xfabb] ========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97] ========= Host Frame:godray.8447d32d/godray [0x1068a]
To my understanding, as i am launching with an empty raygen program, the error must be with either the pipeline or the SBT i am passing as to the optixLaunch call. But i am having a lot of trouble locating what the error is exactly.
How should i best proceed to debug the error, and what could potentially be the cause?
I ran cuda-gbd on the program, which gave the following error:
CUDA Exception: Warp Illegal Address The exception was triggered at PC 0x5555580ccd30 Thread 1 "godray" received signal CUDA_EXCEPTION_14, Warp Illegal Address. [Switching focus to CUDA kernel 0, grid 48, block (0,0,0), thread (32,0,0), device 0, sm 0, warp 2, lane 0] 0x00005555580ccd60 in ??<<<(128,128,1),(64,1,1)>>> ()
It’s hard to say from the error messages, but if you have a small, complete sample I could try to run it.
The first error said “Invalid range on access by cudaMemcopy source.” so I would maybe start by look at the host code for that one. It might be a problem before launch that doesn’t get caught until launch time.
Be aware that older versions of cuda-memcheck don’t actually work with OptiX, they will report incorrect memory access errors because OptiX is not always ABI compatible with CUDA. I heard that OptiX support was added to cuda-memcheck recently, but this morning I’m having trouble getting an answer which versions are expected to work. You might be able to tell by running cuda-memcheck on a working OptiX SDK sample – if it fails then you know your cuda-memcheck doesn’t support OptiX.
cuda-gdb does have OptiX support, but you do have to make sure you’re on a very recent driver with the latest version of cuda-gdb. I don’t immediately know all the reasons you can get a Warp Illegal Address. I’d recommend reviewing all of your module and pipeline options first, compare it to a working OptiX SDK sample. If you don’t catch it, feel free to send me a reproducer.