[slice]support different shape case for GPUScatterAdd op #73971

zhanghonggeng · 2025-07-10T09:27:39Z

PR Category

Performance Optimization

PR Types

Improvements

Description

问题背景：输入:(108, 64, 12288), axis:0, index:input_shape[axis]为例，gather反向相比torch慢60%，因此考虑优化gather_gard。
实现GPUScatterAdd kernel替换GPUScatterAssign。GPUScatterAdd kernel支持stride，通过stride计算将kernel内索引计算转换为首地址+偏移量，简化了kernel内复杂的索引计算，上述case中有60%性能提升。

对应slice case中输入：Tensor([108,64,12288],"float32"), index：Tensor([2,4,6],"int64") 。

getitem中index_size为1时选择gather+reshape kernel作为快速通道，fp32前向gpu score：0.97 -> 0.68，反向gpu score：
2.73 -> 1.21,
gather反向中GPUScatterAdd kernel支持index.numel() != x.dims()[axis_v]的场景。

pcard-67164

… index2

paddle-bot · 2025-07-10T09:27:45Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… index3 slice-check

zhanghonggeng · 2025-07-14T08:24:40Z

/re-run all-failed

zhanghonggeng · 2025-07-14T08:27:52Z

/re-run all-failed

zhanghonggeng added 2 commits July 10, 2025 06:22

[slice]implement GPUScatterAdd op for gather grad

fcce08e

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

a61f0e3

… index2

[slice]list_tensor_gather slice-check

a007519

zhanghonggeng force-pushed the index3 branch from 1219278 to a007519 Compare July 11, 2025 02:36

zhanghonggeng added 2 commits July 14, 2025 03:25

update slice-check

fc2e716

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5afc9e7

… index3 slice-check

zhanghonggeng changed the title ~~[slice]list_tensor_gather test~~ [slice]support different shape case for GPUScatterAdd op Jul 14, 2025

xiaoguoguo626807 approved these changes Jul 14, 2025

View reviewed changes

swgu98 added the skip-ci: win-openblas label Jul 14, 2025

xiaoguoguo626807 merged commit 77166d2 into PaddlePaddle:develop Jul 15, 2025
83 of 86 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[slice]support different shape case for GPUScatterAdd op #73971

[slice]support different shape case for GPUScatterAdd op #73971

Uh oh!

zhanghonggeng commented Jul 10, 2025 •

edited

Loading

paddle-bot bot commented Jul 10, 2025

zhanghonggeng commented Jul 14, 2025

zhanghonggeng commented Jul 14, 2025

Uh oh!

Labels

3 participants

Uh oh!

[slice]support different shape case for GPUScatterAdd op #73971

[slice]support different shape case for GPUScatterAdd op #73971

Uh oh!

Conversation

zhanghonggeng commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

paddle-bot bot commented Jul 10, 2025

zhanghonggeng commented Jul 14, 2025

zhanghonggeng commented Jul 14, 2025

Uh oh!

Labels

3 participants

zhanghonggeng commented Jul 10, 2025 •

edited

Loading