These instructions are intended to set up a deep learning environment for GPU-powered tensorflow.
See here for pytorch GPU install instructions
After following these instructions you'll have:
- Ubuntu 16.04.
- Cuda 9.0 drivers installed.
- A conda environment with python 3.6.
- The latest tensorflow version with gpu support.
Before you begin, you may need to disable the opensource ubuntu NVIDIA driver called nouveau.
Option 1: Modify modprobe file
-
After you boot the linux system and are sitting at a login prompt, press ctrl+alt+F1 to get to a terminal screen. Login via this terminal screen.
-
Create a file: /etc/modprobe.d/nouveau-blacklist.conf e.g. by
sudo touch /etc/modprobe.d/nouveau-blacklist.conf
- Put the following in the above file...
blacklist nouveau options nouveau modeset=0
- Regenerate the kernel initramfs
sudo update-initramfs -u
- reboot system
reboot
- On reboot, verify that noveau drivers are not loaded
lsmod | grep nouveau
If nouveau
driver(s) are still loaded do not proceed with the installation guide and troubleshoot why it's still loaded.
Option 2: Modify Grub load command
From this stackoverflow solution
- When the GRUB boot menu appears : Highlight the Ubuntu menu entry and press the E key. Add the nouveau.modeset=0 parameter to the end of the linux line ... Then press F10 to boot.
- When login page appears press [ctrl + ALt + F1]
- Enter username + password
- Uninstall every NVIDIA related software:
sudo apt-get purge nvidia* sudo reboot
- update apt-get
sudo apt-get update
- Install apt-get deps
sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev curl
- install nvidia drivers
# The 16.04 installer works with 16.10. # download drivers curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb # download key to allow installation sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub # install actual package sudo dpkg -i ./cuda-repo-ubuntu1604_9.0.176-1_amd64.deb # install cuda (but it'll prompt to install other deps, so we try to install twice with a dep update in between sudo apt-get update sudo apt-get install cuda-9-0
2a. reboot Ubuntu
sudo reboot
2b. check nvidia driver install
nvidia-smi # you should see a list of gpus printed # if not, the previous steps failed.
- Install cudnn
wget https://s3.amazonaws.com/open-source-william-falcon/cudnn-9.0-linux-x64-v7.3.1.20.tgz sudo tar -xzvf cudnn-9.0-linux-x64-v7.3.1.20.tgz sudo cp cuda/include/cudnn.h /usr/local/cuda/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
- Add these lines to end of ~/.bashrc:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64" export CUDA_HOME=/usr/local/cuda export PATH="$PATH:/usr/local/cuda/bin"
4a. Reload bashrc
source ~/.bashrc
- Install miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh # press s to skip terms # Do you approve the license terms? [yes|no] # yes # Miniconda3 will now be installed into this location: # accept the location # Do you wish the installer to prepend the Miniconda3 install location # to PATH in your /home/ghost/.bashrc ? [yes|no] # yes
5a. Reload bashrc
source ~/.bashrc
- Create python 3.6 conda env to install tf
conda create -n tensorflow python=3.6 # press y a few times
- Activate env
source activate tensorflow
- update pip (might already be up to date, but just in case...)
pip install --upgrade pip
- Install stable tensorflow with GPU support for python 3.6
pip install --upgrade tensorflow-gpu # If the above fails, try the part below # pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl
- Test tf install
# start python shell python # run test script import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') # when you run sess, you should see a bunch of lines with the word gpu in them (if install worked) # otherwise, not running on gpu sess = tf.Session() print(sess.run(hello))
or alternatively
tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"