Executing Cuda script in LXC container results in "cuda error: no CUDA-capable device is detected"

Question

I followed the following instructions in order to set up Cuda inside an LXC container.

When I try to execute the sample ./deviceQuery script inside the container following error is returned:

$ ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 38 -> no CUDA-capable device is detected Result = FAIL

Cuda is recognised and installed inside the container:

$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2015 NVIDIA Corporation Built on Tue_Aug_11_14:27:32_CDT_2015 Cuda compilation tools, release 7.5, V7.5.17

The nVidia devices are mounted inside the "host and the LXC container:

$ ls -l /dev/nvidia* crw-rw-rw- 1 root root 195, 0 Dec 20 23:31 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Dec 20 23:31 /dev/nvidiactl crw-rw-rw- 1 root root 246, 0 Dec 20 23:31 /dev/nvidia-uvm

When I run sudo nvidia-smi inside the container I get the following error:

Failed to initialize NVML: Unknown Error

How execute Cuda scripts inside containers?

Did you end up solving this? Getting the same error with docker — Nathaniel Bubis
– Nathaniel Bubis, Commented Mar 29, 2016 at 16:06

Community · Accepted Answer · 2017-03-20 10:16:40Z

It looks like this question has already been asked on SuperUser, but I can only flag it as duplicate if it already exists in ServerFault. I'll copy my answer here in hopes that it helps someone who stumbles on this question first.

I had this very same issue, which I wrote about at length here.

The issue you are having may be caused by using an LXC template that doesn't match your host. I am using Proxmox 4.4, which is based on Debian 8.6. My container was based on Ubuntu 16.04. Just like you, I saw the passed nodes in the container with root as the owner and group, not nobody:nogroup as expected.

A forum post I stumbled on inspired me to build a new container based on a template that matched my host, Debian 8.6. Once I did that the /dev nodes were owned by nobody:nogroup and nvidia-smi correctly identified my GPU.

If yours don't match, I strongly recommend you try making them match - the only way I am aware of is to rebuild it.

Stack Exchange Network

Executing Cuda script in LXC container results in "cuda error: no CUDA-capable device is detected"

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Executing Cuda script in LXC container results in "cuda error: no CUDA-capable device is detected"

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions