Problems calling __device__ function in __host__ __device__ function

Hi,

I got some problems calling a host device function. It seems like, the program always tries to call the compiled host code instead of the device code.

My code looks like:

main.cu ... void foo(int *a, int *b){ dev_sth(a,b) } ... foo(a,b) ... 
sth.h ... void do_sth(int *a, int *b); void dev_sth(int *a, int *b); 
sth.cu ... __host__ __device__ void do_sth(int *a, int *b){ a[1]=.... //some code } ... __global__ void sth(int *a. int *b){ do_sth(a,b); } ... void dev_sth(int *a, int *b){ if(!use_cuda){ do_sth<<<blockspergrid, threadsperblock>>>(a,b); }else{ sth(a,b) } } 

Thus the code does following:

  1. call function foo
  2. foo calls dev_sth
  3. dev_sth checks, wheter to use cpu or gpu code, cpu version runs fine (sth(..)), but if cuda:
  4. start __global__ function sth(...), which calls do_sth(...)

Now it appears, as if the device do_sth(…) call in the global function tries to call the host do_sth function.

Am I doing something totally wrong? I would be really thankfull if anyone got some hints or the solution.

Thanks in advance.

What makes you think so?

If I comment out all of the host functions everything works fine. Furthermore, if I debug with NSight, weirdly only some of the threads got the following error:

Status: Exception
Exception: OutOfRangeLoad
Exception Details: MemorySpace=Global Size=4

I got another news. As the compiler says:
“Cannot tell what pointer points to, assuming global memory” pointing to the line with “do_sth(a,b)” within the global function, I think I’m on the right way.

Now the question: How to tell the compiler it should link to device memory?

Hi,

if anyone got the same issue, I was able to solve the problem. I don’t know why but I had to forceinline the device code. Thus the working code now looks like

sth.cu ... __host__ __device__ __forceinline__ void do_sth(int *a, int *b){ a[1]=.... //some code } ... __global__ void sth(int *a. int *b){ do_sth(a,b); } ... void dev_sth(int *a, int *b){ if(!use_cuda){ do_sth<<<blockspergrid, threadsperblock>>>(a,b); }else{ sth(a,b) } } 

Thanks anyway.