I’m working on a few fairly simple data conversion kernels in my code, I’ve got both of them to work find for small data sets, but when I scale up, the output gets all kinds of messed up.
For example: I’m operating on memory that is num_samples in size (we’ll call the data sets data1 and data2
Each “thread” of the function is independant of any others, so I don’t really care what order they get executed in, as long as the input (data1) and output (data2) are in the same order.
I call the function like this:
my_function <<< (num_samples/256) , 256, 0 >>> ( data1, data2 );
if num_samples <= 8388608 everything works fine, when num_samples > 8388608 it doesn’t work (admittedly I’ve only tried with 16M not 8M+1). I don’t get errors or anything, the data is just wrong.
I’ve looked through the documentation constantly this week and haven’t really found anything that really mentions any kind of limits that I’d be running into on this. (the max kernel time for 16M should be ~30ms or so MAX).
Thanks,