Skip to content

Conversation

@ggggxm
Copy link
Contributor

@ggggxm ggggxm commented Jun 25, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

对Group_norm的前向和反向算子GPU Kernel进行修复,支持大Tensor计算。NDHWC Layout的GPU Kernel后续再修复。

  • 将imsize N C等参数改为int64_t类型

  • 限制Grid的大小,在kernel内部增加循环。

  • 性能前后对比:

    N H W C = [4, 1024, 1024, 8] Backward(ns) Forward(ns)
    Before 1,499,555.2 1,297,977.1
    After 1,516,646.6 1,295,388.3

    性能波动在2%以内。

@paddle-bot
Copy link

paddle-bot bot commented Jun 25, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Jun 25, 2025
Copy link
Contributor

@lshpku lshpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加内循环可能影响性能,在简介里补充一下性能数据

@lshpku lshpku closed this Jun 27, 2025
@lshpku lshpku reopened this Jun 27, 2025
@ggggxm
Copy link
Contributor Author

ggggxm commented Jun 30, 2025

/re-run PR-CE-Framework

@lshpku lshpku merged commit e04d497 into PaddlePaddle:develop Jul 1, 2025
66 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

2 participants