Update deep_ep intranode & internode kernels #74284

lshpku · 2025-07-29T06:49:53Z

PR Category

Communication Library

PR Types

Performance

Description

将 intranode & internode 的底层 kernel 更新至官方commit：deepseek-ai/DeepEP@079c5a4 (7月14日)
该 commit 已包含 TMA 优化 internode 性能

本PR修改内容

将intranode.cu、internode.cu、configs.cuh、ibgda_device.cuh直接拷贝过来

将launch.cuh、utils.cuh拷贝过来，但保留 low_latency 仍然依赖的 deprecated 的函数（low_latency 由推理同学维护，不做修改）

将runtime.cu和layout.cu拷贝过来，合并成一个runtime.cu（之前也是这样合并的）

将api.cuh中 intranode & internode 的部分拷贝过来

对deep_ep.hpp中 Buffer 的成员变量做小幅修改

对deep_ep.cpp中 Buffer 的构造函数和 sync 方法，以及涉及 intranode & internode 调用的地方做了修改，正确设置新增的成员变量，适配新的 CUDA 层接口

在types.h里增加一个 helper 方法

正确性测试

使用 test_intranode.py 和 test_internode.py（2、4、8机）进行了单测，均通过

使用DeepseekV3进行了多种PP、EP配置的端到端收敛性测试，均通过

性能变化

新版的优势在于可以用更少的SM达到相同的通信带宽，从而为计算分配更多的SM

例如在DeepseekV3上，deepep sm 20->14， deepgemm sm 112->118，端到端提升 1-2%

Pcard-85711

paddle-bot · 2025-07-29T06:50:05Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

gongweibao

LGTM

XiaoguangHu01

LGTM

)"

lshpku requested review from ForFishes and sneaxiy as code owners July 29, 2025 06:49

lshpku force-pushed the update-deepep-079c5a4 branch 3 times, most recently from 117bd81 to 5f846fc Compare August 4, 2025 11:09

Update deep_ep intranode & internode kernels

a3d0d9e

lshpku force-pushed the update-deepep-079c5a4 branch from 5f846fc to a3d0d9e Compare August 6, 2025 09:17

gongweibao approved these changes Aug 12, 2025

View reviewed changes

zhangbo9674 approved these changes Aug 12, 2025

View reviewed changes

XiaoguangHu01 approved these changes Aug 12, 2025

View reviewed changes

lshpku merged commit 530cd6d into PaddlePaddle:develop Aug 12, 2025
80 of 83 checks passed

maxiaolong001 pushed a commit to maxiaolong001/Paddle that referenced this pull request Aug 12, 2025

Update deep_ep intranode & internode kernels (PaddlePaddle#74284)

c1b0613

lshpku added a commit to lshpku/Paddle that referenced this pull request Oct 28, 2025

Revert "Update deep_ep intranode & internode kernels (PaddlePaddle#74284

e697c42

)"

lshpku mentioned this pull request Oct 28, 2025

Revert "Update deep_ep intranode & internode kernels (#74284)" #76090

Merged

lshpku added a commit to lshpku/Paddle that referenced this pull request Oct 28, 2025

Revert "Update deep_ep intranode & internode kernels (PaddlePaddle#74284

c86b5e0

)"

lshpku mentioned this pull request Oct 28, 2025

Revert "Update deep_ep intranode & internode kernels (#74284)" #76091

Merged

risemeup1 pushed a commit that referenced this pull request Oct 29, 2025

Revert "Update deep_ep intranode & internode kernels (#74284)" (#76091)

f615b6d

risemeup1 pushed a commit that referenced this pull request Oct 29, 2025

Revert "Update deep_ep intranode & internode kernels (#74284)" (#76090)

e2a8155

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update deep_ep intranode & internode kernels #74284

Update deep_ep intranode & internode kernels #74284

Uh oh!

lshpku commented Jul 29, 2025 •

edited

Loading

paddle-bot bot commented Jul 29, 2025

gongweibao left a comment

XiaoguangHu01 left a comment

Uh oh!

Labels

4 participants

Uh oh!

Update deep_ep intranode & internode kernels #74284

Update deep_ep intranode & internode kernels #74284

Uh oh!

Conversation

lshpku commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

本PR修改内容

正确性测试

性能变化

paddle-bot bot commented Jul 29, 2025

gongweibao left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

lshpku commented Jul 29, 2025 •

edited

Loading