Install Cuda 10.0 on Ubuntu16.04 (for DGX-1)

herbol87 · September 22, 2019, 6:47am

Hi All,

I am trying to install CUDA-10.0 on Ubuntu 16.04 running on DGX-1 server.
I followed the instructions for “runfile installation” in https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html#runfile.

After step 4.2.6 (i.e. Reboot the system to reload the graphical interface.), I checked the CUDA version as follows:

nvcc --version

which returns:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130

However, when I run:

nvidia-smi

it returns:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I went to step 4.4 (Device Node Verification.), and found that the device files /dev/nvidia* don’t exist.
I tried to create them manually, however, running:

sudo /sbin/modprobe nvidia

returns:

modprobe: ERROR: could not insert 'nvidia': Exec format error

Please help to solve the problem. Thanks!

Other details.

lspci | grep -i nvidia 06:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 07:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 85:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 86:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 89:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 8a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)

uname -m && cat /etc/*release x86_64 DGX_NAME="DGX Server" DGX_PRETTY_NAME="NVIDIA DGX Server" DGX_SWBUILD_DATE="2018-03-20" DGX_SWBUILD_VERSION="3.1.6" DGX_COMMIT_ID="1b0f58ecbf989820ce745a9e4836e1de5eea6cfd" DGX_SERIAL_NUMBER=QTFCOU8280021 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS" NAME="Ubuntu" VERSION="16.04.6 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.6 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial

gcc --version gcc (GCC) 5.4.0 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

uname -r 4.4.0-142-generic

cat /proc/version Linux version 4.4.0-142-generic (buildd@lgw01-amd64-033) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) ) #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019

dpkg -l | grep nvidia ii dgx-peer-mem-loader 1.1-10 amd64 Ensure nvidia is loaded before nv_peer_mem

Robert_Crovella · September 24, 2019, 1:52am

DGX-1 software is mostly maintained and installed via package manager systems. You can use a runfile installer, but you’ll need to be aware of the conflicts that are inherent. These conflicts are documented in the CUDA linux install guide in the section “handle conflicting install methods”.

In short, CUDA 10 toolkit is installed, but your driver install is broken. You’ll need to clean up and remove all installation history, to rectify this.

Topic		Replies	Views
CUDA 10 installation on Ubuntu 16.04 CUDA Setup and Installation	0	1953	January 4, 2019
Cuda 10.0 install claims missing driver, but it is installed. CUDA Setup and Installation	6	2547	May 24, 2019
Cuda Installation on Ubuntu 18.04 Failing CUDA Setup and Installation	8	2910	March 26, 2020
Driver doesn't see my Tesla C1060 CUDA Programming and Performance	5	9749	May 25, 2011
Issue in verifying the cuda 10.0 installation with cuda samples CUDA Setup and Installation	0	676	February 21, 2019
Cuda10 installing problem, nvidia-smi is not working CUDA Setup and Installation	1	4811	December 27, 2019
Cannot install CUDA 10 CUDA Setup and Installation	3	1261	June 1, 2019
Can't seem to set-up CUDA 10.0 with Nvidia 430.50 drivers CUDA Setup and Installation	1	1631	January 28, 2020
CUDA 10.0 - no CUDA-capable device is detected, nvidia-smi does not work. CUDA Setup and Installation	0	2419	April 24, 2019
Problem starting Cuda Driver on Ubuntu 20.04 Linux	2	4983	June 15, 2021

Install Cuda 10.0 on Ubuntu16.04 (for DGX-1)

Related topics