Adding torch accelerator and requirements file to FSDP2 example #1375

dggaytan · 2025-07-21T20:21:13Z

Adding torch accelerator support to FSDP2 example and

Updates to FSDP2 example:

Script Renaming and Documentation Updates:
- Renamed train.py to example.py and updated references in README.md to reflect the new filename. Added instructions to install dependencies via requirements.txt before running the example.
GPU Verification and Device Initialization:
- Added a verify_min_gpu_count function to ensure at least two GPUs are available before running the example.
- Updated device initialization in main() to dynamically detect and configure the device type using torch.accelerator. This improves compatibility with different hardware setups.

New supporting files:

Dependency Management:
- Added a requirements.txt file listing required dependencies (torch>=2.7 and numpy).
Script for Running Examples:
- Introduced run_example.sh to simplify launching FSDP2 example.
Integration into Distributed Examples:
- Added a new function distributed_FSDP2 in run_distributed_examples.sh to include the FSDP2 example in the distributed testing workflow.
CC: @msaroufim @malfet @dvrogozh

netlify · 2025-07-21T20:21:19Z

✅ Deploy Preview for pytorch-examples-preview canceled.

Name	Link
🔨 Latest commit	`5e960d8`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-examples-preview/deploys/68826ce9e58ebb000857417b

dvrogozh · 2025-07-23T18:10:47Z

distributed/FSDP2/example.py

- torch.distributed.init_process_group(backend="nccl", device_id=device)
+ if torch.accelerator.is_available():
+ device_type = torch.accelerator.current_accelerator()
+ device: torch.device = torch.device(f"{device_type}:{rank}")


Why do we need device: torch.device = instead of just device =?

It was just a flag for me, but I'll change it to use just torch.device

dvrogozh · 2025-07-23T18:15:08Z

distributed/FSDP2/example.py

+ backend = torch.distributed.get_default_backend_for_device(device)
+ torch.distributed.init_process_group(backend=backend, device_id=device)


I think these 2 lines should work for cpu as well. You can simplify the code:

if torch.accelerator.is_available(): ... else: device = torch.device("cpu") backend = torch.distributed.get_default_backend_for_device(device) torch.distributed.init_process_group(backend=backend, device_id=device)

Signed-off-by: dggaytan <diana.gaytan.munoz@intel.com>

soumith · 2025-08-06T01:37:53Z

thank you!

meta-cla bot added the cla signed label Jul 21, 2025

dvrogozh reviewed Jul 23, 2025

View reviewed changes

Adding torch accelerator and requirements file to FSDP2 example

5e960d8

Signed-off-by: dggaytan <diana.gaytan.munoz@intel.com>

dggaytan force-pushed the dggaytan/distributed_FSDP2 branch from 1f0d7d3 to 5e960d8 Compare July 24, 2025 17:27

dggaytan requested a review from dvrogozh July 24, 2025 17:27

soumith approved these changes Aug 6, 2025

View reviewed changes

soumith merged commit 5a4ca92 into pytorch:main Aug 6, 2025
9 checks passed

dvrogozh mentioned this pull request Aug 6, 2025

Adding torch accelerator to ddp-tutorial-series example #1376

Closed

dggaytan mentioned this pull request Aug 14, 2025

Updated device-agnostic code on tutorials gera-aldama/audio#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding torch accelerator and requirements file to FSDP2 example #1375

Adding torch accelerator and requirements file to FSDP2 example #1375

dggaytan commented Jul 21, 2025

netlify bot commented Jul 21, 2025 •

edited

Loading

dvrogozh Jul 23, 2025

dggaytan Jul 24, 2025

dggaytan Jul 24, 2025

dvrogozh Jul 23, 2025

dggaytan Jul 24, 2025

Uh oh!

soumith commented Aug 6, 2025

Labels

3 participants

		backend = torch.distributed.get_default_backend_for_device(device)
		torch.distributed.init_process_group(backend=backend, device_id=device)

Adding torch accelerator and requirements file to FSDP2 example #1375

Adding torch accelerator and requirements file to FSDP2 example #1375

Conversation

dggaytan commented Jul 21, 2025

Updates to FSDP2 example:

New supporting files:

netlify bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-examples-preview canceled.

dvrogozh Jul 23, 2025

Choose a reason for hiding this comment

dggaytan Jul 24, 2025

Choose a reason for hiding this comment

dggaytan Jul 24, 2025

Choose a reason for hiding this comment

dvrogozh Jul 23, 2025

Choose a reason for hiding this comment

dggaytan Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

soumith commented Aug 6, 2025

Labels

3 participants

netlify bot commented Jul 21, 2025 •

edited

Loading