Avoiding local memory with structs

Cygnus_X1 · January 25, 2010, 7:23pm

When compiling my code I realised that for some reason I am using local memory.

My register usage is way below limits so it is not just some registers spilling out problem. I am also not addressing any registers dynamically nor using any local arrays.

While searching for the cause I narrowed down to the following construction:

int4 data=context.global_array[threadIdx.x]

context is a struct of some pointers to global memory, which is passed as a parameter to the kernel (hence resides in shared memory).

If I replace the above with the following:

int4 data; data.x=context.global_array[threadIdx.x].x; data.y=context.global_array[threadIdx.x].y; data.z=context.global_array[threadIdx.x].z; data.w=context.global_array[threadIdx.x].w;

Suddenly, I do not use local memory at all. What can be the reason and how can I avoid it?

My suspision is that the compiler does not know if pointer context.global_array is aligned or not, so it does not know if it can use 16-byte wide load instruction or not and somehow it forces the load to be done into local memory instead of registers.

However the pointer I am using here is a value returned from cudaMalloc and should be aligned well. Simply the compiler does not know that at kernel compilation.

If that is the case, how can I inform the compiler that 16-byte wide load instruction is safe at this point?

If that is not the case, what am I doing wrong and how it can be avoided?

Sarnath · January 26, 2010, 5:57am

Just my guess:
I think the compiler is treating an “int4” data-structure as a local-array. When you access them with “w”,“x”,“y” and "z’ throughout the kernel - they are treated as constant indices into the array and hence allocated in registers. If you dont, it moves to local memory…

But I cant justify why the compiler should do that…

Cygnus_X1 · January 26, 2010, 7:05pm

But operation like this

int4 a, b; [...] a=b;

Won’t access it dynamically will it? Even if you replace int4 with some bigger struct, as long as it fits into register space I see no reason not to keep it there?

Topic		Replies	Views
CUDA __local__ force CUDA Programming and Performance	6	6028	June 26, 2008
global load of struct with int[4] the struct is loaded in 4 global reads CUDA Programming and Performance	3	2580	October 3, 2008
How to stop compiler putting structs in local? And suboptimising your program CUDA Programming and Performance	3	2732	June 10, 2007
union and local memory CUDA Programming and Performance	6	6467	February 20, 2008
How to force array not to be allocated in local memory? CUDA Programming and Performance	12	4690	December 30, 2009
Strange local memory usage CUDA Programming and Performance	7	2015	September 21, 2009
why is vec4 local variable dumped to local memory? CUDA Programming and Performance	0	4360	June 11, 2010
How can I avoid local memory? CUDA Programming and Performance	1	1573	February 5, 2009
how to know what variables are placed in local memory? CUDA Programming and Performance	9	5479	January 29, 2010
Arrays and Local Memory CUDA Programming and Performance	5	3136	October 17, 2009

Avoiding local memory with structs

Related topics