pytorch
diff --git a/‎docs/pjrt.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/pjrt.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎test/test_train_mp_imagenet.py‎
Lines changed: 4 additions & 1 deletion b/‎test/test_train_mp_imagenet.py‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎test/test_train_mp_mnist.py‎
Lines changed: 4 additions & 1 deletion b/‎test/test_train_mp_mnist.py‎
Lines changed: 4 additions & 1 deletion
@@ -206,7 +206,7 @@ PJRT_DEVICE=GPU GPU_NUM_DEVICES=4 python3 xla/test/test_train_mp_imagenet.py --f
 You can also use `torchrun` to initiate the single-node multi-GPU training. For example,
 
 ```
-PJRT_DEVICE=GPU torchrun --nnodes 1 --nproc-per-node ${NUM_GPU_DEVICES} xla/test/test_train_mp_imagenet.py --fake_data --batch_size=128 --num_epochs=1
+PJRT_DEVICE=GPU torchrun --nnodes 1 --nproc-per-node ${NUM_GPU_DEVICES} xla/test/test_train_mp_imagenet.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
 ```
 
 In the above example, `--nnodes` means how many machines (physical machines or VMs) to be used (it is 1 since we do single-node training). `--nproc-per-node` means how many GPU devices to be used.
@@ -245,10 +245,10 @@ On the second GPU machine, run
 --nnodes=2 \
 --node_rank=1 \
 --nproc_per_node=4 \
---rdzv_endpoint="<MACHINE_0_IP_ADDRESS>:12355" pytorch/xla/test/test_train_mp_imagenet_torchrun.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
+--rdzv_endpoint="<MACHINE_0_IP_ADDRESS>:12355" pytorch/xla/test/test_train_mp_imagenet.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
 ```
 
-the difference between the 2 commands above are `--node_rank` and potentially `--nproc_per_node` if you want to use different number of GPU devices on each machine. All the rest are identical.
+the difference between the 2 commands above are `--node_rank` and potentially `--nproc_per_node` if you want to use different number of GPU devices on each machine. All the rest are identical. For more information about `torchrun`, please refer to this [page](https://pytorch.org/docs/stable/elastic/run.html).
 
 ## Differences from XRT
 
 
@@ -31,6 +31,9 @@
  '--ddp': {
  'action': 'store_true',
  },
+ '--pjrt_distributed': {
+ 'action': 'store_true',
+ },
  '--profile': {
  'action': 'store_true',
  },
@@ -175,7 +178,7 @@ def _train_update(device, step, loss, tracker, epoch, writer):
 
 
 def train_imagenet():
- if FLAGS.ddp:
+ if FLAGS.ddp or FLAGS.pjrt_distributed:
  dist.init_process_group('xla', init_method='xla://')
 
  print('==> Preparing data..')
 
@@ -5,6 +5,9 @@
  '--ddp': {
  'action': 'store_true',
  },
+ '--pjrt_distributed': {
+ 'action': 'store_true',
+ },
 }
 
 FLAGS = args_parse.parse_common_options(
@@ -73,7 +76,7 @@ def _train_update(device, step, loss, tracker, epoch, writer):
 
 
 def train_mnist(flags, **kwargs):
- if flags.ddp:
+ if flags.ddp or flags.pjrt_distributed:
  dist.init_process_group('xla', init_method='xla://')
 
  torch.manual_seed(1)