Skip to content

Conversation

@zhanghonggeng
Copy link
Contributor

@zhanghonggeng zhanghonggeng commented May 8, 2025

PR Category

Performance Optimization

PR Types

Performance

Description

case:

out = a[ val == 1] 和 t1 = (val == 1).nonzero().flatten() out = paddle.gather( a, t1) 

bool index是一维的时候使用gather替换gather nd有更好的性能
测试机器V100

python api shape, dtype 平均单次运行时间 ms 加速比
paddle [] gather_nd未向量化 [108, 64, 12288], [108]paddle.bfloat16, paddle.bool 2201.24  
paddle [] gather_nd向量化 [108, 64, 12288], [108]paddle.bfloat16, paddle.bool 513.40 4.29
paddle [] gather [108, 64, 12288], [108]paddle.bfloat16, paddle.bool 289.12 7.61
paddle.gather [108, 64, 12288], [108]paddle.bfloat16, paddle.int64 300.64 7.32
torch [] [108, 64, 12288], [108]torch.bfloat16, torch.bool 510.85 4.31

对纯False的情况进行了测试,index为单值False的情况下比torch慢,其他情况下均有提升。
case:index = False
get_item paddle_gpu: 82.50 us, torch_gpu: 34.36 us, Paddle/Torch GPU score: 2.40)
get_item paddle_cpu: 100.40 us, torch_cpu: 47.92 us, Paddle/Torch CPU score: 2.10)

纯False:index = [False,False,False,...]
get_item paddle_gpu: 80.36 us, torch_gpu: 95.80 us, Paddle/Torch GPU score: 0.84)
get_item paddle_cpu: 100.39 us, torch_cpu: 113.18 us, Paddle/Torch CPU score: 0.89)

混合:index = [False,True,False,False,Ture,....]
get_item paddle_gpu: 479.26 us, torch_gpu: 587.41 us, Paddle/Torch GPU score: 0.82)
get_item paddle_cpu: 497.92 us, torch_cpu: 598.02 us, Paddle/Torch CPU score: 0.83)
Pcard-67164

@paddle-bot
Copy link

paddle-bot bot commented May 8, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@changeyoung98 changeyoung98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaoguoguo626807 xiaoguoguo626807 merged commit f51e3ff into PaddlePaddle:develop May 12, 2025
49 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants