I am trying to get the minimum value from a collection of float values, by taking advantage of the Atomic operations provided by CUDA. . I cannot use reduction because of memory constraints. However, I get the error message: Instruction ‘{atom,red}.shared’ requires .target sm_12 or higher when I try compiling the code below with a Shared variable passed as the “SharedMem” arguement.
I have a 9400m laptop with compute capability 1.1.
__device__ static float* atomicMin(float* SharedMem, float value, float *old) { old[0] = *SharedMem; float assumed; if (old[0] <= value) { return old; } do { assumed = old[0]; old[0] = ::atomicCAS((unsigned int*)SharedMem, __float_as_int(assumed), __float_as_int(value)); } while (old[0] != assumed); return old; } Take for example calling the function “getMin_Kernel” below:
__shared__ __device__ float LowestDistance; __global__ void getMin_Kernel(float* AllFloats, int* NumberOfFloats) { int j = (blockDim.x * blockIdx.x + threadIdx.x); if (j < NumberOfFloats[0]) { float myFloat; myFloat=*(atomicMin(&LowestDistance, NumberOfFloats[0], &myFloat)); } } However, if I pass a non-shared variable it compiles without issues, however, I get a runtime error. I am guessing the run time error occurs because atomicCAS requires a global or shared variable. Can anyone please help with a way to get around the compilation error. I am currently using ManagedCUDA in Visual studio 2010
Thanks.