`nvidia-container-cli` driver error when trying to run Nvidia docker on Jetson Nano

Hi all,

I am following NVIDIA Container Runtime on Jetson · NVIDIA/nvidia-docker Wiki · GitHub and try to run the deviceQuery container on a Jetson Nano node. But when building the container, I got the following error:

$ sudo docker build -t devicequery . Sending build context to Docker daemon 214.2MB Step 1/6 : FROM nvcr.io/nvidia/l4t-base:r32.3.1 ---> aaaa63e7b12d Step 2/6 : RUN apt-get update && apt-get install -y --no-install-recommends make g++ ---> Running in a9a6681a68c9 OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown 

When I run deviceQuery on the host without container, it can work properly. I search a lot for solution and I found that it might because the driver is not initialized properly:

$ sudo nvidia-container-cli -k -d /dev/tty info -- WARNING, the following logs are for debugging purposes only -- I0731 03:40:00.306584 31410 nvc.c:282] initializing library context (version=1.2.0, build=d22237acaea94aa5ad5de70aac903534ed598819) I0731 03:40:00.306735 31410 nvc.c:256] using root / I0731 03:40:00.306769 31410 nvc.c:257] using ldcache /etc/ld.so.cache I0731 03:40:00.306789 31410 nvc.c:258] using unprivileged user 65534:65534 I0731 03:40:00.306848 31410 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0731 03:40:00.307075 31410 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment W0731 03:40:00.307355 31410 nvc.c:172] failed to detect NVIDIA devices I0731 03:40:00.307634 31416 nvc.c:192] loading kernel module nvidia E0731 03:40:00.308079 31416 nvc.c:194] could not load kernel module nvidia I0731 03:40:00.308105 31416 nvc.c:204] loading kernel module nvidia_uvm E0731 03:40:00.308336 31416 nvc.c:206] could not load kernel module nvidia_uvm I0731 03:40:00.308356 31416 nvc.c:212] loading kernel module nvidia_modeset E0731 03:40:00.308589 31416 nvc.c:214] could not load kernel module nvidia_modeset I0731 03:40:00.309024 31417 driver.c:101] starting driver service E0731 03:40:00.309567 31417 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory I0731 03:40:00.309821 31410 driver.c:196] driver service terminated successfully nvidia-container-cli: initialization error: driver error: failed to process request 

I have also tried re-installed nvidia-docker2 and reboot the device and docker a couple of times. All of them does not work. The following is my system information:

$ cat /etc/nv_tegra_release # R32 (release), REVISION: 4.3, GCID: 21589087, BOARD: t210ref, EABI: aarch64, DATE: Fri Jun 26 04:38:25 UTC 2020 
$ sudo dpkg -l | grep nvidia ii libnvidia-container-tools 1.2.0-1 arm64 NVIDIA container runtime library (command-line tools) ii libnvidia-container0:arm64 0.9.0~beta.1 arm64 NVIDIA container runtime library ii libnvidia-container1:arm64 1.2.0-1 arm64 NVIDIA container runtime library ii nvidia-container-csv-cuda 10.2.89-1 arm64 Jetpack CUDA CSV file ii nvidia-container-csv-cudnn 8.0.0.180-1+cuda10.2 arm64 Jetpack CUDNN CSV file ii nvidia-container-csv-tensorrt 7.1.3.0-1+cuda10.2 arm64 Jetpack TensorRT CSV file ii nvidia-container-csv-visionworks 1.6.0.501 arm64 Jetpack VisionWorks CSV file ii nvidia-container-runtime 3.3.0-1 arm64 NVIDIA container runtime ii nvidia-container-toolkit 1.2.1-1 arm64 NVIDIA container runtime hook ii nvidia-docker2 2.4.0-1 all nvidia-docker CLI wrapper ii nvidia-l4t-3d-core 32.4.3-20200625213809 arm64 NVIDIA GL EGL Package ii nvidia-l4t-apt-source 32.4.3-20200625213809 arm64 NVIDIA L4T apt source list debian package ii nvidia-l4t-bootloader 32.4.3-20200709231554 arm64 NVIDIA Bootloader Package ii nvidia-l4t-camera 32.4.3-20200625213809 arm64 NVIDIA Camera Package ii nvidia-l4t-configs 32.4.3-20200625213809 arm64 NVIDIA configs debian package ii nvidia-l4t-core 32.4.3-20200625213809 arm64 NVIDIA Core Package ii nvidia-l4t-cuda 32.4.3-20200625213809 arm64 NVIDIA CUDA Package ii nvidia-l4t-firmware 32.4.3-20200625213809 arm64 NVIDIA Firmware Package ii nvidia-l4t-graphics-demos 32.4.3-20200625213809 arm64 NVIDIA graphics demo applications ii nvidia-l4t-gstreamer 32.4.3-20200625213809 arm64 NVIDIA GST Application files ii nvidia-l4t-init 32.4.3-20200625213809 arm64 NVIDIA Init debian package ii nvidia-l4t-initrd 32.4.3-20200625213809 arm64 NVIDIA initrd debian package ii nvidia-l4t-jetson-io 32.4.3-20200625213809 arm64 NVIDIA Jetson.IO debian package ii nvidia-l4t-jetson-multimedia-api 32.4.3-20200625213809 arm64 NVIDIA Jetson Multimedia API is a collection of lower-level APIs that support flexible application development. ii nvidia-l4t-kernel 4.9.140-tegra-32.4.3-20200625213809 arm64 NVIDIA Kernel Package ii nvidia-l4t-kernel-dtbs 4.9.140-tegra-32.4.3-20200625213809 arm64 NVIDIA Kernel DTB Package ii nvidia-l4t-kernel-headers 4.9.140-tegra-32.4.3-20200625213809 arm64 NVIDIA Linux Tegra Kernel Headers Package ii nvidia-l4t-multimedia 32.4.3-20200625213809 arm64 NVIDIA Multimedia Package ii nvidia-l4t-multimedia-utils 32.4.3-20200625213809 arm64 NVIDIA Multimedia Package ii nvidia-l4t-oem-config 32.4.3-20200625213809 arm64 NVIDIA OEM-Config Package ii nvidia-l4t-tools 32.4.3-20200709231554 arm64 NVIDIA Public Test Tools Package ii nvidia-l4t-wayland 32.4.3-20200625213809 arm64 NVIDIA Wayland Package ii nvidia-l4t-weston 32.4.3-20200625213809 arm64 NVIDIA Weston Package ii nvidia-l4t-x11 32.4.3-20200625213809 arm64 NVIDIA X11 Package ii nvidia-l4t-xusb-firmware 32.4.3-20200625213809 arm64 NVIDIA USB Firmware Package 
$ sudo docker info Client: Debug Mode: false Server: Containers: 58 Running: 16 Paused: 0 Stopped: 42 Images: 18 Server Version: 19.03.6 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: nvidia runc Default Runtime: nvidia Init Binary: docker-init containerd version: runc version: init version: Security Options: seccomp Profile: default Kernel Version: 4.9.140-tegra Operating System: Ubuntu 18.04.4 LTS OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 3.871GiB Name: jetson-0 ID: J3ZG:UP5R:OCKL:LPDA:HOKI:5DH5:2ZHS:2R7H:CGC3:DK5V:AAN6:UPWA Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false 

Can anyone give me some suggestions about this? Thank you so much!

Hi,

Please noticed that there are some version dependency between Nano and docker.
This is because Jetson docker will mount library from the host directly.

It looks like you are using JetPack4.4 GA which includes rel-32.4.3.
So please update the dockerfile for rel-32.4.3 too.

FROM nvcr.io/nvidia/l4t-base:r32.4.3 RUN apt-get update && apt-get install -y --no-install-recommends make g++ COPY ./samples /tmp/samples WORKDIR /tmp/samples/1_Utilities/deviceQuery RUN make clean && make CMD ["./deviceQuery"] EOF 

Thanks.

Hi @AastaLLL,

Thanks for responding. I changed the Dockerfile as you indicated. But it issued the same problem:

$ sudo docker build -t devicequery . Sending build context to Docker daemon 214.2MB Step 1/6 : FROM nvcr.io/nvidia/l4t-base:r32.4.3 ---> c93fc89026d9 Step 2/6 : RUN apt-get update && apt-get install -y --no-install-recommends make g++ ---> Running in a0318d71e788 OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown 

Thanks for the testing.

We are going to reproduce this issue.
Will share more information with you later.

Hi @AastaLLL,

Thanks for your time and effort. I just reflash my Nanos and the problem disappear.

I guess the cause is I was trying to set up a GPU-enable Kubernetes cluster following this instruction: GitHub - NVIDIA/k8s-device-plugin: NVIDIA device plugin for Kubernetes , which install the nvidia-docker2 package and may overwrite the pre-built libraries.

Really good to know the issue is gone.
Thanks for updating this to us.