Skip to content

Conversation

@xuxinyi389
Copy link
Contributor

@xuxinyi389 xuxinyi389 commented Apr 3, 2025

PR Category

Auto Parallel

PR Types

Improvements

Description

card-73263

  1. Processmesh 支持get_group方法,支持转换为对应的 Group
  2. ProcessMesh 支持 get_submesh_with_dim方法,返回对应“dim”维度的通信SubMesh
  3. ProcessMesh 索引方法新增支持“str”类型,本质上调用 get_submesh_with_dim

Example:

mesh_2d = dist.ProcessMesh([[0, 1, 2, 3], [4, 5, 6, 7]], dim_names=["dp", "tp"])
dp_mesh = mesh_2d["dp"]
tp_mesh = mesh_2d["tp"]

Calling mesh_2d["dp"] on rank 0, 4 returns a 1D submesh of DeviceMesh:([0, 4]).
Calling mesh_2d["dp"] on rank 1, 5 returns a 1D submesh of DeviceMesh:([1, 5]).
Calling mesh_2d["dp"] on rank 2, 6 returns a 1D submesh of DeviceMesh:([2, 6]).
Calling mesh_2d["dp"] on rank 3, 7 returns a 1D submesh of DeviceMesh:([3, 7]).
Calling mesh_2d["tp"] on rank 0, 1, 2, 3 returns a 1D submesh of DeviceMesh:([0, 1, 2, 3]).
Calling mesh_2d["tp"] on rank 4, 5, 6, 7 returns a 1D submesh of DeviceMesh:([4, 5, 6, 7]).

@paddle-bot
Copy link

paddle-bot bot commented Apr 3, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@xuxinyi389 xuxinyi389 force-pushed the enhance_processmesh branch from f6b4054 to b589534 Compare April 8, 2025 07:50
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Apr 16, 2025

Sorry to inform you that b589534's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@xuxinyi389 xuxinyi389 changed the title processmesh support convert group [AutoParallel] Enhance processmesh Apr 22, 2025
@xuxinyi389 xuxinyi389 force-pushed the enhance_processmesh branch from fd92afc to 38f9e95 Compare April 24, 2025 06:29
@xuxinyi389
Copy link
Contributor Author

/re-run approval

return ProcessMesh([new_mesh[index]], new_dim_names)
return ProcessMesh(new_mesh, new_dim_names)

def get_submesh_with_dim(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方法和get_mesh_with_dim有什么区别?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_mesh_with_dim 只是对 mesh 的一个简单重排,如mesh.get_mesh_with_dim(“dp”)只是把mesh的dp维放在最外维,并没有减少mesh内process_ids。mesh.get_submesh_with_dim("dp")则是获取包含当前rank的dp通信组的submesh。比如说:mesh_2d = dist.ProcessMesh([[0, 1, 2, 3], [4, 5, 6, 7]], dim_names=["dp", "tp"])
dp_mesh = mesh_2d.get_submesh_with_dim("dp")
on rank 0, 4 returns a 1D submesh of ProcessMesh:([0, 4]).
on rank 1, 5 returns a 1D submesh of ProcessMesh:([1, 5]).
on rank 2, 6 returns a 1D submesh of ProcessMesh:([2, 6]).
on rank 3, 7 returns a 1D submesh of ProcessMesh:([3, 7]).

@From00 From00 merged commit 6a0f5ce into PaddlePaddle:develop Apr 28, 2025
42 of 44 checks passed
YqGe585 pushed a commit to YqGe585/Paddle that referenced this pull request May 7, 2025
* processmesh support convert group * add_test * fix_test * fix_en_docs * move_test * fix_bugs_of_get_mesh_with_dim
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants