Hi I have a weird problem when loading data into shared memory from global memory. I have a two dimensional array in shared memory and a one dimensional array in device memory. i access data like this:
shared[i][j] = global[k]
it immediately crashes when debugging normally but doesnt crash in emudebug. also if i first copy the global portion to a local variable and then copy the local variable to the shared then it works fine
temp = global[k] shared[i][j] = temp
works fine
both global, shared, and temp are of type float3 any help would be appreciated
Hi. I suppose that it is a problem with CUDA. I faced with similar things (http: // forums.nvidia.com/index.php? showtopic=32495). In my case I copy the data to the shared memory using type unsigned char (without dependence from the type of the original data). For example
//unsigned shar *global, *shared;
for (int i = 0; i <N; i ++) shared [i] = global [i];
If I use complex types (for example float, float2, float3) then I faced with mistakes. My method longer, but it works. If in your case the structure (temp = global [k]; shared [i] [j] = temp;) works than use it. I suppose that this problem will be solved in new version Tookit.
i’m not 100% sure what you mean by alignment but if i’m using the same built-in type (float3) for both of them then why should it act the way it’s acting
i mean the kernel works perfectly in emu debug and in normal debug when using the intermediate temp variable but when i assign immediately it crashes in normal debug but not in emu debug - makes no sense to me at all