support to shard on the same tensor dim by many mesh dim, only dynamic graph #73233

liufengwei0103 · 2025-06-10T08:28:11Z

PR Category

Auto Parallel

PR Types

New features

Description

main features:
1.enhance expression of placement shard that shard the same tensor dim by many mesh dim by adding co_shard_order to support to merge many sharded tensor dim in reshape.
2.enhance reshard api to express that rearrange data before sharding tensor to support to reshard fused qkv in dist env.

main changes:
1.upgrade dims_mapping to be type of vector of vector
2.refactor nd_mesh reshard transform
3.add co_shard_order and split_factor in shard placement
4.add dims_mapping proxy to back compatible old spmd rule during transitional phase between dims_mapping of vector and new dims_mapping of vector of vector.

usage:
get a co_shard tensor

import paddle import paddle.distributed as dist a = paddle.to_tensor([[1, 2], [3, 4], [5, 6], [7, 8]]) mesh = dist.ProcessMesh([[0, 1], [2, 3]], dim_names=['x', 'y']) placements = [ dist.Shard(0, co_shard_order=0), dist.Shard(0, co_shard_order=1), ] b = dist.shard_tensor(a, mesh, placements) print(b.placements) # [Shard(0, shard_order=0), Shard(0, shard_order=1)] print(b._local_value()) # rank0 [[1, 2]], rank1 [[3, 4]], rank2 [[5, 6]], rank3 [[7, 8]]

co shard in reshape

import paddle import paddle.distributed as dist a = paddle.to_tensor([[1, 2], [3, 4], [5, 6], [7, 8]], dtype='float32') mesh = dist.ProcessMesh([[0, 1], [2, 3]], dim_names=['x', 'y']) placements = [dist.Shard(0), dist.Shard(1)] input = dist.shard_tensor(a, mesh, placements) out = paddle.reshape(input, [-1]) print(out.placements) # [Shard(0, shard_order=0), Shard(0, shard_order=1)]

rearrange data before sharding

import paddle import paddle.distributed as dist mesh = dist.ProcessMesh([[0, 1], [2, 3]], dim_names=['x', 'y']) a = paddle.to_tensor([[1, 2], [3, 4], [5, 6], [7, 8]]) placements = [dist.Shard(0, split_factor=2), dist.Replicate()] b = dist.shard_tensor(a, mesh, placements) print(b.placements) # [Shard(0, split_factor=2), Replicate()] print(b._local_value()) # rank0 rank1 [[1, 2], [5, 6]] , ran2 rank3 [[3, 4], [7, 8]]

More use cases can be seen in the test cases.

Pcard-67164

…c graph

paddle-bot · 2025-06-10T08:28:16Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… dy_support_co_shard

paddle/phi/infermeta/spmd_rules/batch_norm.cc

paddle/fluid/pybind/auto_parallel_py.cc

jeff41404 · 2025-06-19T03:52:44Z

The results of the three example codes in the Description above also need to be explained through print or comments to make it easier for others to understand

From00

后续需要补充一下用户文档，包括如何使用以及如何添加spmd rules

paddle/phi/core/distributed/auto_parallel/dist_attr.h

zhiqiu

LGTM

python/paddle/distributed/auto_parallel/placement_type.py

liufengwei0103 · 2025-06-20T02:49:47Z

The results of the three example codes in the Description above also need to be explained through print or comments to make it easier for others to understand

done

From00

LGTM

jeff41404

LGTM

luotao1

LGTM

SigureMo

LGTMeow for pybind API without type annotations

XiaoguangHu01

LGTM

support to shard on the same tensor dim by many mesh dim, only dynami…

53eca4b

…c graph

liufengwei0103 added 4 commits June 10, 2025 16:53

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0bd437c

… dy_support_co_shard

add test case

a175a50

format test case

feb95fa

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

11fccca

… dy_support_co_shard

liufengwei0103 changed the title ~~support to shard on the same tensor dim by many mesh dim, only dynami…~~ support to shard on the same tensor dim by many mesh dim, only dynamic graph Jun 10, 2025

liufengwei0103 added 3 commits June 10, 2025 22:07

fix error

f627b49

fix merge error

25041ef

delete old code

cc4e753

liufengwei0103 marked this pull request as ready for review June 11, 2025 06:17

liufengwei0103 requested review from ForFishes, LiYuRio and zhiqiu as code owners June 11, 2025 06:17

liufengwei0103 added 15 commits June 11, 2025 14:20

add dist attr check

52db19c

clang-format

fbf8ebe

fix dims mapping in gtest

24bf193

fix back compatible

cad5b2d

Merge remote-tracking branch 'origin' into dy_support_co_shard

eaadef4

fix error

8506a74

fix reshape spmd error

098a971

fix reshape spmd error

3c8d469

Merge remote-tracking branch 'origin' into dy_support_co_shard

930a67a

fix error

5399fbe

revert shard spec

c7daef3

fix compatible error

46a63f3

Merge remote-tracking branch 'origin' into dy_support_co_shard

9493dfb

fix ambiguous error

7470721

fix ambiguous error

1f7e095

liufengwei0103 closed this Jun 14, 2025

liufengwei0103 added 4 commits June 18, 2025 07:23

Merge remote-tracking branch 'origin' into dy_support_co_shard

c39d566

delete check in nd mesh transform

ea56d15

fix error msg

11f0151

fix error msg

dec6991

liufengwei0103 marked this pull request as draft June 18, 2025 23:27

liufengwei0103 marked this pull request as ready for review June 18, 2025 23:27

flush ci

38cad4c

jeff41404 reviewed Jun 19, 2025

View reviewed changes

paddle/phi/infermeta/spmd_rules/batch_norm.cc Show resolved Hide resolved

jeff41404 reviewed Jun 19, 2025

View reviewed changes

paddle/fluid/pybind/auto_parallel_py.cc Outdated Show resolved Hide resolved

From00 reviewed Jun 19, 2025

View reviewed changes

paddle/phi/core/distributed/auto_parallel/dist_attr.h Outdated Show resolved Hide resolved

zhiqiu previously approved these changes Jun 19, 2025

View reviewed changes

python/paddle/distributed/auto_parallel/placement_type.py Show resolved Hide resolved

python/paddle/distributed/auto_parallel/placement_type.py Outdated Show resolved Hide resolved

rename some variable

0fb24be

liufengwei0103 dismissed zhiqiu’s stale review via 0fb24be June 19, 2025 06:29

liufengwei0103 added 4 commits June 19, 2025 14:33

Merge remote-tracking branch 'origin' into dy_support_co_shard

627dc47

rename dims_mapping_2d

04c9adc

fix

8c8ca8a

flush ci

b77bd67

From00 approved these changes Jun 20, 2025

View reviewed changes

From00 reviewed Jun 20, 2025

View reviewed changes

liufengwei0103 mentioned this pull request Jun 20, 2025

【开源任务】算子切分推导规则开发，支持更多模型使用自动并行，简化更多用户的分布式开发成本 #72415

Closed

phlrain self-requested a review June 20, 2025 06:33

phlrain approved these changes Jun 20, 2025

View reviewed changes

jeff41404 approved these changes Jun 20, 2025

View reviewed changes

luotao1 approved these changes Jun 20, 2025

View reviewed changes

SigureMo approved these changes Jun 20, 2025

View reviewed changes

XiaoguangHu01 approved these changes Jun 21, 2025

View reviewed changes

From00 merged commit 2327fff into PaddlePaddle:develop Jun 21, 2025
49 of 52 checks passed

ooooo-create mentioned this pull request Jul 18, 2025

use PADDLE_ENFORCE_EQ to replace PADDLE_ENFORCE in PR#73114 review #74124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support to shard on the same tensor dim by many mesh dim, only dynamic graph #73233

support to shard on the same tensor dim by many mesh dim, only dynamic graph #73233

Uh oh!

liufengwei0103 commented Jun 10, 2025 •

edited

Loading

paddle-bot bot commented Jun 10, 2025

Uh oh!

Uh oh!

jeff41404 commented Jun 19, 2025

From00 left a comment

Uh oh!

zhiqiu left a comment

Uh oh!

Uh oh!

liufengwei0103 commented Jun 20, 2025

From00 left a comment

jeff41404 left a comment

luotao1 left a comment

SigureMo left a comment

XiaoguangHu01 left a comment

Uh oh!

Labels

8 participants

support to shard on the same tensor dim by many mesh dim, only dynamic graph #73233

support to shard on the same tensor dim by many mesh dim, only dynamic graph #73233

Uh oh!

Conversation

liufengwei0103 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

paddle-bot bot commented Jun 10, 2025

Uh oh!

Uh oh!

jeff41404 commented Jun 19, 2025

From00 left a comment

Choose a reason for hiding this comment

Uh oh!

zhiqiu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liufengwei0103 commented Jun 20, 2025

From00 left a comment

Choose a reason for hiding this comment

jeff41404 left a comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

SigureMo left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

8 participants

liufengwei0103 commented Jun 10, 2025 •

edited

Loading