NVENC and NVDEC work on only one GPU with Multi-GPU setups with NVIDIA Container Toolkit in Driver >=565

The above are critical issues where NVENC and NVDEC work on only one GPU with Multi-GPU setups with NVIDIA Container Toolkit in driver versions >565, which is >=570.

This is in relation to NVENC crashing (due to not finding a CUDA device) when using multiple NVIDIA GPUs while trying to use any index other than β€˜0’. Many efforts tried to only expose devices using NVIDIA_VISIBLE_DEVICES envvar and assigning them using index or GPU-UUID.

Only one GPU works (it may be the first GPU, last GPU, or anything in between), and everything else fails in FFmpeg:

[h264_nvenc @ 0x] OpenEncodeSessionEx failed: unsupported device (2): (no details) [h264_nvenc @ 0x] No capable devices found 

Moreover, GStreamer also fails in a similar way when FFmpeg fails:

nvh264encoder gstnvh264encoder.cpp:2158:gst_nv_h264_encoder_register_cuda:<cudacontext0> Failed to open session nvh265encoder gstnvh265encoder.cpp:2196:gst_nv_h265_encoder_register_cuda:<cudacontext0> Failed to open session nvenc gstnvenc.c:685:gst_nv_enc_register: NvEncOpenEncodeSessionEx failed: codec h264, device 0, error code 2 nvenc gstnvenc.c:685:gst_nv_enc_register: NvEncOpenEncodeSessionEx failed: codec h265, device 0, error code 2 

The above is on driver version 580.82.07 with five NVIDIA Titan Xp GPUs.

Driver versions 565 or 550 work fine, but this is a regression of the driver version 570 or higher; therefore, I am bringing this up in the forum to the driver team.

This is widely known to happen in Kubernetes, but it may also happen in Docker.

CC @amrits @generix

1 Like

We are also closely monitoring this issue. Based on our test, ffmpeg NVENC functionality within K8S pods is working well on Tesla T4 nodes with multiple GPU cards. Since issue NVENC Fails in Kubernetes Pods on all but the last GPU with Driver 570.x or 580.x Β· Issue #1249 Β· NVIDIA/nvidia-container-toolkit Β· GitHub have indicated stable performance on V100 GPUs, and considering the current findings, we suspect there might be some driver-level issues affecting NVENC support for the GeForce seriesβ€”particularly models like the 3060, 4090, and 5090. We’re continuing to look into this and will provide updates as we learn more.

It has been confirmed that driver version 565.57.01 does not have this issue, but both the 570 and 580 series are affected. What is the current status regarding this problem?