Limit on kernel block / grid numbers?

SrJsignal · August 16, 2007, 9:54pm

I’m working on a few fairly simple data conversion kernels in my code, I’ve got both of them to work find for small data sets, but when I scale up, the output gets all kinds of messed up.

For example: I’m operating on memory that is num_samples in size (we’ll call the data sets data1 and data2

Each “thread” of the function is independant of any others, so I don’t really care what order they get executed in, as long as the input (data1) and output (data2) are in the same order.

I call the function like this:

my_function <<< (num_samples/256) , 256, 0 >>> ( data1, data2 );

if num_samples <= 8388608 everything works fine, when num_samples > 8388608 it doesn’t work (admittedly I’ve only tried with 16M not 8M+1). I don’t get errors or anything, the data is just wrong.

I’ve looked through the documentation constantly this week and haven’t really found anything that really mentions any kind of limits that I’d be running into on this. (the max kernel time for 16M should be ~30ms or so MAX).

Thanks,

mfatica · August 16, 2007, 10:04pm

The maximum size for each dimension in the grid is 2^16-1=65535.
If you are using a 1D grid and 256 threads per block, you can only process
65535*256=16776960 elements if you are using a 1:1 mapping between element position and threadid.
Look at the Black-Scholes example for a way to handle generic size arrays.

SrJsignal · August 17, 2007, 12:45pm

Ahh, that would be why it’s not working, thanks I didn’t see that documented specifically.

Topic		Replies	Views
hitting the grid size limitation CUDA Programming and Performance	5	1498	November 13, 2009
MAximum block per grid CUDA Programming and Performance	8	5958	April 18, 2011
Max blocks per grid CUDA Programming and Performance	3	14742	August 3, 2009
maximum total number of threads for kernel Maximum allowed number of blocks in grid CUDA Programming and Performance	2	4117	August 10, 2007
Probably a simple answer Simple CUDA code - unexpected result CUDA Programming and Performance	7	4926	October 27, 2010
Grid dimensions CUDA Programming and Performance	6	5668	September 18, 2009
Grid 4x4 but only runs 10 blocks? CUDA Programming and Performance	6	4250	April 7, 2008
the maximum number of blocks and threads CUDA Programming and Performance	10	7099	September 4, 2008
String search with many threads CUDA Programming and Performance	11	6011	November 5, 2010
how many threads can used in one grid 5126553565535 CUDA Programming and Performance	1	1696	June 24, 2009

Limit on kernel block / grid numbers?

Related topics