-
Couldn't load subscription status.
- Fork 560
Add gpu doc for how to build PyTorch/XLA from source with GPU support. #5384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,15 +1,15 @@ | ||
| # How to run with PyTorch/XLA:GPU | ||
| | ||
| PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU This doc will go over the basic steps to run PyTorch/XLA on a nvidia gpu instance | ||
| PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU. This doc will go over the basic steps to run PyTorch/XLA on a nvidia GPU instances. | ||
| | ||
| ## Create a GPU instance | ||
| Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a GPU instance with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla/tree/jackcao/gpu_doc#-available-images-and-wheels). | ||
| | ||
| ## Environment Setup | ||
| You can either use a local machine with GPU attached or a GPU VM on the cloud. For example in Google Cloud you can follow this [doc](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus) to create the GPU VM. | ||
| | ||
| To create a GPU VM in Google Compute Engine, follow the [Google Cloud documentation](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus). | ||
| ## Environment Setup | ||
| | ||
| ### Docker | ||
| Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a docker container with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla#available-docker-images-and-wheels). | ||
| ``` | ||
| sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7 | ||
| sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common | ||
| | @@ -74,3 +74,37 @@ Epoch 1 train begin 06:12:38 | |
| ``` | ||
| ## AMP (AUTOMATIC MIXED PRECISION) | ||
| AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You can checkout our [mnist example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist_amp.py) and [imagenet example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_imagenet_amp.py). Note that we also used a modified version of [optimizers](https://github.com/pytorch/xla/tree/master/torch_xla/amp/syncfree) to avoid the additional sync between device and host. | ||
| | ||
| ## Develop PyTorch/XLA on a GPU instance (build PyTorch/XLA from source with GPU support) | ||
| | ||
| 1. Inside a GPU VM, create a docker container from a development docker image. For example: | ||
| | ||
| ``` | ||
| sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8 | ||
| sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common | ||
| distribution=$(. /etc/os-release;echo $ID$VERSION_ID) | ||
| curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - | ||
| curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list | ||
| sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit | ||
| sudo systemctl restart docker | ||
| sudo docker run --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8 | ||
| sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash | ||
| ``` | ||
| | ||
| 2. Build PyTorch and PyTorch/XLA from source. | ||
| | ||
| ``` | ||
| git clone https://github.com/pytorch/pytorch.git | ||
| cd pytorch | ||
| USE_CUDA=0 python setup.py install | ||
| | ||
| git clone https://github.com/pytorch/xla.git | ||
| cd xla | ||
| XLA_CUDA=1 python setup.py install | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this env variable also relevant when installing a nightly wheel? This might be worth adding too e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
| ||
| ``` | ||
| | ||
| 3. Verify if PyTorch and PyTorch/XLA have been installed successfully. | ||
| | ||
| If you can run the test in the section | ||
| [Run a simple model](#run-a-simple-model) successfully, then PyTorch and | ||
| PyTorch/XLA should have been installed successfully. | ||
Uh oh!
There was an error while loading. Please reload this page.