I have been experimenting with CUDA to see if it would be useful in a project, however I have ran into an in pass, where my global functions is simply not being called in one of my programs. I was hoping that someone here would know what would stop a global functions from being called and or how to diagnose such a problem.
details:
I am ruing on a windows vista laptop with the 2.0 beta version of CUDA. I manniged to get most of the sample programs to work.
the code that has the problem is:
complexptr w; w.realptr=wr_dev; w.imagptr=wi_dev;//these variables are CUDA //-----------------------cheak for errors----------------- //-----------------------calling the kernals--------------------- dim3 threadsize(block_size,block_size,block_size); dim3 dimGrid ( (wsize0/(threadsize.x)) + ((!(wsize0/(threadsize.x)))?0:1) , (wsize1/(threadsize.y)) + ((!(wsize1/(threadsize.y)))?0:1),(wsize2/(threadsize.z)) + ((!(wsize2/(threadsize.z)))?0:1) ); dim3 pass2gird (1,1,1); dim3 pass2threadsize(dimGrid.x*dimGrid.y*dimGrid.z/2,1,1); //to stroe the outpout of the 1st pass complex* outbfer; CUDA_SAFE_CALL_NO_SYNC(cudaMalloc((void**) &outbfer,dimGrid.x*dimGrid.y*dimGrid.z*sizeof(complex))); complex* finaloutput; CUDA_SAFE_CALL_NO_SYNC(cudaMalloc((void**) &finaloutput,1*sizeof(complex))) ... sincreduce_3d<<<dimGrid,threadsize,threadsize.x*threadsize.y*threadsize.z*sizeof(complex)>>>(outbfer,w, R, Bx, By, Bz, wsize0,wsize1,wsize2,dim3(rX[i],rY[j],rZ[k]) ); by global function sincreduce_3d, and structure complex and complexptr are defined:
struct complex { float real; float imag; }; struct complexptr { float *realptr; float *imagptr; }; //NOTE: this was taken from the NVIDA file reduction_kernel.cu this MUST be docmuntead //WARNING //reduces an input complexptr, to an outpout complex pointer, one for each box __global__ void sincreduce_3d(complex* out,complexptr w, const float R,const float Bx,const float By,const float Bz, const int nx,const int ny,const int nz,dim3 pointanted) { extern __shared__ complex buffers[]; however when I step thrught the code on VC++ 2005 express edition the global function is not called, and the output variables are not changed. This is in contrast to the other programs I have written in cuda where the debugger has worked.