I am trying port the following code constructs to run on the GPU and nvcc is dying with internal errors
Here is the relevent code fragment from my hacked up version of x264 codec’s pixel.c
#define PIXEL_SAD_C( name, lx, ly ) \ DEVICE int name( uint8_t *pix1, int i_stride_pix1, \ uint8_t *pix2, int i_stride_pix2 ) \ { \ int i_sum = 0; \ int x, y; \ for( y = 0; y < ly; y++ ) \ { \ for( x = 0; x < lx; x++ ) \ { \ i_sum += abs( pix1[x] - pix2[x] ); \ } \ pix1 += i_stride_pix1; \ pix2 += i_stride_pix2; \ } \ return i_sum; \ } PIXEL_SAD_C( pixel_sad_16x16, 16, 16 ) PIXEL_SAD_C( pixel_sad_16x8, 16, 8 ) PIXEL_SAD_C( pixel_sad_8x16, 8, 16 ) PIXEL_SAD_C( pixel_sad_8x8, 8, 8 ) PIXEL_SAD_C( pixel_sad_8x4, 8, 4 ) PIXEL_SAD_C( pixel_sad_4x8, 4, 8 ) PIXEL_SAD_C( pixel_sad_4x4, 4, 4 ) PIXEL_SAD_C( pixel_sad_4x2, 4, 2 ) PIXEL_SAD_C( pixel_sad_2x4, 2, 4 ) PIXEL_SAD_C( pixel_sad_2x2, 2, 2 ) typedef int (*x264_pixel_cmp_t) ( uint8_t *, int, uint8_t *, int ); DEVICE x264_pixel_cmp_t pixel_sad[10] = { pixel_sad_16x16, pixel_sad_16x8, pixel_sad_8x8, pixel_sad_8x8, pixel_sad_8x4, pixel_sad_4x8, pixel_sad_4x4, pixel_sad_4x2, pixel_sad_2x4, pixel_sad_2x2 }; And later on I call it like this
results[ tid ]= pixel_sad[i_pixel]( x_pixels, FENC_STRIDE, y_pixels + __umul24( mb_y, i_stride) + mb_x, i_stride) + p_cost_mvx[mb_x<<2] + p_cost_mvy[mb_y<<2]; So is this an illegal code construct? I tried to declare the expanded functions in the macro as device (define DEVICE device) as they are only called on the GPU in this context. There are no device function restrictions that obviously apply to this except maybe the restriction that _device functions cannot have their pointers taken.
I know I can expand the macro by hand and declare the functions individually but this way is more maintainable and expanding the macro won’t get around issues with pointers to device function (if that is the problem).
Suggestions?
Spencer