Greetings,
currently I’m trying to implement genetic algorithm using CUDA. I use the code below to evaluate each individuals using a CUDA kernel.
__global__ void evaluate(int * population, int * distance, int * cost, int nTowns, int * d_index) { int sum = 0; int t0, t1, idx; idx = threadIdx.x + blockIdx.x * blockDim.x; for (size_t i = 1; i < nTowns; i++) { t0 = idx * nTowns + (i - 1); t1 = idx * nTowns + i; sum = sum + distance[population[t0] * nTowns + population[t1]]; } t0 = idx * nTowns + nTowns - 1; t1 = idx * nTowns; cost[idx] = sum + distance[population[t0] * nTowns + population[t1]]; d_index[idx] = threadIdx.x; } I occasionally got some errors from this code, like 2-3 times out of 100 runs. Then I tried using cuda-memcheck and I got these outputs:
GPUassert: an illegal memory access was encountered ga_tes_3a.cu 469 ========= CUDA-MEMCHECK ========= Program hit cudaErrorIllegalAddress (error 77) due to "an illegal memory access was encountered" on CUDA API call to cudaDeviceSynchronize. ... GPUassert: unspecified launch failure ga_tes_3a.cu 469 ========= CUDA-MEMCHECK ========= Invalid __global__ read of size 4 ========= at 0x000000e0 in evaluate(int*, int*, int*, int, int*) ========= by thread (327,0,0) in block (7,0,0) ========= Address 0x3c45c467c is out of bounds ... ========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize. How can I track this error? Any idea of why is this happened?
I’m sorry if my English is bad.