[Auto-parallel] Fix sharding all_gather overlap in auto_dy #73717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Xing-lil merged 14 commits into PaddlePaddle:develop from Xing-lil:fix_sharding_allgather_overlap

Jul 8, 2025

+122 −25

Contributor

Xing-lil commented Jun 30, 2025 •

edited

Loading

PR Category

Auto Parallel

PR Types

Bug fixes

Description

Launching all all_gather at once blocks overlap with other sync/comm ops.
Fix: Prefetch 1 buffer ahead by hook to enable overlap.
Ref: Same fix in dynamic_hand #73406
Pcard-70448

Update api.py

fe3d757

Xing-lil mentioned this pull request

[Auto-parallel] Fix sharding all_gather overlap in auto_dy PaddlePaddle/PaddleNLP#10782

Merged

2 tasks

Update api.py

24d448e

liym27 reviewed

View reviewed changes

python/paddle/distributed/auto_parallel/api.py Outdated

       def fuse_all_gather_hook_func(param_storage, comm_group):  
     @paddle.autograd.no_grad()  
     def fuse_comm(*_):  
     shard_size = param_storage._numel() // comm_group.nranks  
 

Contributor

liym27 Jul 1, 2025

这里，如果 param_storage._numel() 不能被整除，会怎么处理

Contributor Author

Xing-lil Jul 1, 2025

在 _build_fuse_param_view 中的 get_padded_size 确保了param_storage._numel() 是 comm_group.nranks 整数倍，故不会出现这种情况。

python/paddle/distributed/auto_parallel/api.py Outdated

       task = paddle.distributed.all_gather(  
     param_storage,  
     slice_buffer,  
     group=self._sharding_group,  
 

Contributor

liym27 Jul 1, 2025

为什么传了 comm_group 但实际用的 self._sharding_group？

Contributor Author

Xing-lil Jul 1, 2025

已做修改，感谢！

python/paddle/distributed/auto_parallel/api.py

      
     def _set_sharding_overlap(self, enable_sharding_overlap, layers):  
     self.enable_sharding_overlap = enable_sharding_overlap  
     self._layers = layers

Contributor

liym27 Jul 1, 2025

1、后续要用到 self._layers 做参数查找和注册 hook，这里需要对 layers 参数做检查，比如，类型是 paddle.nn.Layer
2、这个函数本身就是 enable_sharding_overlap 为 True 时才会调用吧，是有有必要再传这个参数？

Contributor Author

Xing-lil Jul 1, 2025

1和2均已做修改，感谢！

python/paddle/distributed/auto_parallel/api.py

       'param'  
     ]  
     layer = _find_layer_containing_param(first_param)  
     layer.register_forward_pre_hook(  
 

Contributor

liym27 Jul 1, 2025

这里每次调用 _find_layer_containing_param 都会遍历所有子layer，建议缓存 param2layer 的关系
考虑 layer 为 None 的情况

Contributor Author

Xing-lil Jul 1, 2025

已修改为用局部变量 param2layer = {} 缓存，已有 self._layers 为 None 时的报错提醒。

python/paddle/distributed/auto_parallel/api.py Outdated

       )  
    
     def _set_tensor_fusion(self, enable_tensor_fusion):  
     self.enable_tensor_fusion = enable_tensor_fusion

Contributor

liym27 Jul 1, 2025

这个函数本身就是 enable_tensor_fusion 为 True，不需再传参数 enable_tensor_fusion 了。建议：

def _enable_tensor_fusion(self):
self.enable_tensor_fusion = True

Contributor Author

Xing-lil Jul 1, 2025

已做修改，感谢！

python/paddle/distributed/auto_parallel/api.py Outdated

       )  
     for layer in self._layers.sublayers():  
     for p in layer.parameters(include_sublayers=False):  
     if param.name == p.name:  
 

Contributor

liym27 Jul 1, 2025

这里只能通过 name 来判断吗？是否参数名会被用户修改？

Contributor Author

Xing-lil Jul 1, 2025

修改为根据 param 的id判断

Update api.py

02f206f

liym27 reviewed

View reviewed changes

python/paddle/distributed/auto_parallel/api.py Outdated

       sync_op=False,  
     ).wait()  
    
     def _async_reduce_scatter(self):

Contributor

liym27 Jul 1, 2025

如线下沟通，还有以下问题：

函数命名
增加注释

Contributor Author

Xing-lil Jul 1, 2025

已做相应修改，感谢！

Update api.py

ea1ec63

liym27 previously approved these changes

View reviewed changes

Contributor

liym27 left a comment

LGTM

ZHUI closed this in PaddlePaddle/PaddleNLP#10782

ZHUI reopened this

Update semi_auto_parallel_sharding_stage_1.py

a00a2c5

Xing-lil dismissed liym27’s stale review via a00a2c5

July 2, 2025 09:24

Update semi_auto_parallel_sharding_stage_1.py

6bb3f44

codecov-commenter commented Jul 2, 2025 •

edited

Loading

Codecov Report

Attention: Patch coverage is 55.81395% with 19 lines in your changes missing coverage. Please review.

Please upload report for BASE (develop@5624a3d). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
python/paddle/distributed/auto_parallel/api.py	55.81%	19 Missing ⚠️

Additional details and impacted files

@@ Coverage Diff @@ ## develop #73717 +/- ## ========================================== Coverage ? 55.81% ========================================== Files ? 1 Lines ? 43 Branches ? 0 ========================================== Hits ? 24 Misses ? 19 Partials ? 0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Xing-lil added 7 commits

July 3, 2025 10:28

Update api.py

c06c077

Update api.py

13c3d8f

Merge branch 'PaddlePaddle:develop' into fix_sharding_allgather_overlap

58ded16

Update api.py

93d2dc6

Update api.py

e8e50da

Update api.py

3c9457f

Update api.py

74a19bf

liym27 approved these changes

View reviewed changes

Contributor

liym27 left a comment

LGTM

swgu98 added the skip-ci: approval label

rerun commit

bcd98bd

XieYunshen added the skip-ci: coverage label

Xing-lil merged commit 3d95ceb into PaddlePaddle:develop

74 of 76 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment