Skip to content

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Sep 11, 2024

Root cause of the issue

C:\actions-runner\_work\_temp\conda_environment_10772459803\lib\site-packages\torch\cuda\__init__.py:129: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\c10\cuda\CUDAFunctions.cpp:108.) 

Hence we are seeing for cuda 12+ jobs:

torch.cuda.is_available: False 

As a result its failing builder checks here:
https://github.com/pytorch/builder/actions/runs/10776424717/job/29883192429

torchvision: 0.20.0.dev20240908+cu121 torch.cuda.is_available: True torch.ops.image._jpeg_version() = 80 Is torchvision usable? True German shepherd (cpu): 37.6% Traceback (most recent call last): File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 113, in <module> main() File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 101, in main smoke_test_torchvision_decode_jpeg("cuda") File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 37, in smoke_test_torchvision_decode_jpeg img_jpg = decode_jpeg(img_jpg_data, device=device) File "C:\Jenkins\Miniconda3\envs\conda-env-10776424717\lib\site-packages\torchvision\io\image.py", line 223, in decode_jpeg return torch.ops.image.decode_jpegs_cuda([input], mode.value, device)[0] File "C:\Jenkins\Miniconda3\envs\conda-env-10776424717\lib\site-packages\torch\_ops.py", line 1116, in __call__ return self._op(*args, **(kwargs or {})) RuntimeError: decode_jpegs_cuda: torchvision not compiled with nvJPEG support 

Driver Update issue should not prevent us to compile torchvision with full CUDA support. We can do it even with CPU instance. Hence when FORCE_CUDA flag is set, we should try to include nvjpeg module.

As a followup we should address Driver issue

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8641

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1249d0b with merge base 00e7fa1 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@atalman atalman merged commit db5f8a0 into pytorch:main Sep 11, 2024
@atalman atalman deleted the fix_nvjpeg_include_windows branch September 11, 2024 16:27
@github-actions
Copy link

Hey @atalman!

You merged this PR, but no labels were added.
The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Sep 13, 2024
Reviewed By: vmoens Differential Revision: D62581682 fbshipit-source-id: 40ee1636bb1608da92b1fc258634d26c88a430fd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants