PyTorch utilize CPU instead of GPU

When I try to use CUDA for training NN or just for simple calculation, PyTorch utilize CPU instead of GPU

Python 3.8.3 (default, Jun 25 2020, 23:21:14) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> torch.cuda.is_available() True >>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") >>> device device(type='cuda', index=0) >>> tensor = torch.rand(1, 1, 10).to(device) >>> tensor tensor([[[0.1126, 0.1737, 0.9678, 0.8833, 0.6923, 0.2118, 0.9874, 0.9397, 0.4831, 0.4274]]], device='cuda:0') >>> tensor_two = tensor + tensor >>> tensor_two tensor([[[0.2252, 0.3474, 1.9356, 1.7666, 1.3847, 0.4236, 1.9747, 1.8794, 0.9661, 0.8549]]], device='cuda:0') >>> while True: ... tensor_two = tensor + tensor ... 

nvidia-smi output under Windows:

Sun Jun 28 10:46:36 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 455.41 Driver Version: 455.41 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1070 WDDM | 00000000:01:00.0 On | N/A | | 44% 39C P2 33W / 151W | 1394MiB / 8192MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 4 C Insufficient Permissions N/A | | 0 N/A N/A 1268 C+G ...lPanel\SystemSettings.exe N/A | | 0 N/A N/A 1468 C+G C:\Windows\System32\dwm.exe N/A | | 0 N/A N/A 3784 C+G ...bbwe\Microsoft.Photos.exe N/A | | 0 N/A N/A 8844 C+G ...5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 8892 C+G ...artMenuExperienceHost.exe N/A | | 0 N/A N/A 10380 C+G ...ropbox\Client\Dropbox.exe N/A | | 0 N/A N/A 14504 C+G ...y\ShellExperienceHost.exe N/A | | 0 N/A N/A 15984 C+G ...8bbwe\WindowsTerminal.exe N/A | | 0 N/A N/A 19380 C+G ...2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 28380 C+G ...b3d8bbwe\WinStore.App.exe N/A | +-----------------------------------------------------------------------------+ 

Task manager:

WSL version:
Linux version 4.19.121-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Fri Jun 19 21:06:10 UTC 2020

Ubuntu 20.04 LTS

DxDiag.txt (114.6 KB)

The same code on Ubuntu PC:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:01:00.0 Off | N/A | | 0% 46C P2 63W / 250W | 663MiB / 11178MiB | 19% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A | | 0% 39C P8 11W / 250W | 12MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A | | 0% 43C P8 12W / 250W | 12MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A | | 0% 43C P8 13W / 250W | 12MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 942 G /usr/lib/xorg/Xorg 25MiB | | 0 1239 G /usr/bin/gnome-shell 57MiB | | 0 5591 C ...vlad/.pyenv/versions/pytorch/bin/python 567MiB | +-----------------------------------------------------------------------------+ 
2 Likes

Same thing here. I use a surface book 2, linux kernel 4.19.121, ubuntu and miniconda.
While the GPU is detected by pytorch, it is not used during training.

3 Likes

Tensorflow docker container doesn’t utilize GPU too.

Train on 60000 samples Epoch 1/10 60000/60000 [==============================] - 39s 642us/sample - loss: 0.4938 - accuracy: 0.8272 Epoch 2/10 60000/60000 [==============================] - 27s 447us/sample - loss: 0.3753 - accuracy: 0.8636 Epoch 3/10 60000/60000 [==============================] - 26s 436us/sample - loss: 0.3360 - accuracy: 0.8761 Epoch 4/10 60000/60000 [==============================] - 27s 458us/sample - loss: 0.3117 - accuracy: 0.8863 Epoch 5/10 60000/60000 [==============================] - 33s 547us/sample - loss: 0.2945 - accuracy: 0.8915 Epoch 6/10 60000/60000 [==============================] - 34s 571us/sample - loss: 0.2815 - accuracy: 0.8951 Epoch 7/10 60000/60000 [==============================] - 37s 615us/sample - loss: 0.2700 - accuracy: 0.8998 Epoch 8/10 60000/60000 [==============================] - 38s 626us/sample - loss: 0.2591 - accuracy: 0.9039 Epoch 9/10 60000/60000 [==============================] - 39s 643us/sample - loss: 0.2498 - accuracy: 0.9059 Epoch 10/10 60000/60000 [==============================] - 36s 595us/sample - loss: 0.2404 - accuracy: 0.9105 [W 18:12:36.300 NotebookApp] Notebook tensorflow-tutorials/classification.ipynb is not trusted [I 18:12:38.682 NotebookApp] Kernel started: 9eb86799-5559-4d34-ba36-7edd178525c4 2020-07-09 18:12:53.102083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6 2020-07-09 18:12:53.103647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6 2020-07-09 18:13:51.119733: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-07-09 18:13:51.220018: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:51.220439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1 coreClock: 1.683GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s 2020-07-09 18:13:51.220576: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-07-09 18:13:51.220709: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-07-09 18:13:51.222465: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2020-07-09 18:13:51.223247: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2020-07-09 18:13:51.225612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2020-07-09 18:13:51.228846: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2020-07-09 18:13:51.229013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-07-09 18:13:51.230293: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:51.231493: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:51.231930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0 2020-07-09 18:13:51.240228: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3410010000 Hz 2020-07-09 18:13:51.241999: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56b3460 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-07-09 18:13:51.242058: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-07-09 18:13:51.689421: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:51.689939: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56465a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-07-09 18:13:51.689973: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1 2020-07-09 18:13:51.691056: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:51.691691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1 coreClock: 1.683GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s 2020-07-09 18:13:51.691838: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-07-09 18:13:51.691882: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-07-09 18:13:51.691952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2020-07-09 18:13:51.691996: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2020-07-09 18:13:51.692096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2020-07-09 18:13:51.692196: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2020-07-09 18:13:51.692287: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-07-09 18:13:51.695324: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:51.697101: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:51.697587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0 2020-07-09 18:13:51.697738: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-07-09 18:13:52.656161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-07-09 18:13:52.656231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 2020-07-09 18:13:52.656282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N 2020-07-09 18:13:52.657540: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:52.657991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2020-07-09 18:13:52.659207: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2020-07-09 18:13:52.660162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6835 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) 2020-07-09 18:14:10.154011: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 6.67G (7167590400 bytes) from device: CUDA_ERROR_UNKNOWN: unknown error 2020-07-09 18:14:25.532223: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 6.01G (6450831360 bytes) from device: CUDA_ERROR_UNKNOWN: unknown error [I 18:14:38.780 NotebookApp] Saving file at /tensorflow-tutorials/classification.ipynb 2020-07-09 18:14:39.436606: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 5.41G (5805748224 bytes) from device: CUDA_ERROR_UNKNOWN: unknown error 2020-07-09 18:15:30.640365: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 
1 Like

Tensorflow docker containers should work. Do you mind to describe the exact container you tried to run (the actual docker command line) and how did you determin it wasn’t utilizing the GPU?

I ran commands from wsl user guide.
Docker command line:
docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter

CPU/GPU utilization was monitored with task manager and nvidia-smi under Windows.

2 Likes

I just ran the above docker command and checked my GPU usage when running the samples, it stayed at 0% throughout:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 465.12 Driver Version: 465.12 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 3080 WDDM | 00000000:40:00.0 Off | N/A | | 0% 60C P2 111W / 340W | 9764MiB / 10240MiB | 2% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ 

I’m hoping to be able to use Pytorch with Cuda on Windows in the future.

Offtopic: When I run the python commands @vvodan posted in the first post I get the same results, pytorch can see the GPU and enables CUDA without actually using it even though I’m not using a docker to run python in. (Just Anaconda) Does this mean I have access to the gpu without running a docker? If not, does that mean I have to use a prebuilt docker image to eventually speed up ML?

–Chris