[FSDPv2] Shard on the maximal dim of weights #7134

alanwaketan · 2024-05-29T03:29:32Z

Summary:
This pull request makes FSDPv2 to shard on the maximal dim of weights instead of the 0th dim.

Test Plan:
XLA_USE_SPMD=1 PJRT_DEVICE=TPU python test/spmd/test_fsdp_v2.py

JackCaoG · 2024-05-29T03:33:45Z

torch_xla/experimental/spmd_fully_sharded_data_parallel.py

 continue
- spmd.mark_sharding(param, mesh, _prepare_spmd_partition_spec(param))
+ spmd.mark_sharding(
+ param, mesh, _prepare_spmd_partition_spec(param, shard_maximal=True))


should we make it configureable?

No, not necessary... It shouldn't matter to the user...

ok I think I get it.. We need to do a all-gather anyway before entering the layer and only one dimension is being sharded.

Yea, that's right. If the 0th dim is like size of 8 (MoE) and we are sharding it on v5p-2048, it will be a disaster.

alanwaketan · 2024-05-29T03:34:54Z

Thanks Jack for approving the change.

alanwaketan · 2024-05-29T04:30:17Z

Skip GPU to move fast.

alanwaketan added the tpuci label May 29, 2024

alanwaketan requested review from JackCaoG and jonb377 May 29, 2024 03:29

alanwaketan self-assigned this May 29, 2024

alanwaketan added 2 commits May 29, 2024 03:33

initial commit

082b0e6

Fix litners

9dd39cc

JackCaoG reviewed May 29, 2024

View reviewed changes

alanwaketan force-pushed the alanwaketan/fsdp_maximal branch from 701277f to 9dd39cc Compare May 29, 2024 03:33

JackCaoG approved these changes May 29, 2024

View reviewed changes

alanwaketan merged commit 15fc0f1 into master May 29, 2024

alanwaketan deleted the alanwaketan/fsdp_maximal branch May 29, 2024 04:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FSDPv2] Shard on the maximal dim of weights #7134

[FSDPv2] Shard on the maximal dim of weights #7134

Uh oh!

alanwaketan commented May 29, 2024

JackCaoG May 29, 2024

alanwaketan May 29, 2024

JackCaoG May 29, 2024

alanwaketan May 29, 2024

alanwaketan commented May 29, 2024

alanwaketan commented May 29, 2024

Labels

2 participants

Uh oh!

[FSDPv2] Shard on the maximal dim of weights #7134

[FSDPv2] Shard on the maximal dim of weights #7134

Uh oh!

Conversation

alanwaketan commented May 29, 2024

JackCaoG May 29, 2024

Choose a reason for hiding this comment

alanwaketan May 29, 2024

Choose a reason for hiding this comment

JackCaoG May 29, 2024

Choose a reason for hiding this comment

alanwaketan May 29, 2024

Choose a reason for hiding this comment

alanwaketan commented May 29, 2024

alanwaketan commented May 29, 2024

Labels

2 participants