Skip to content

Conversation

@Eddie-Wang1120
Copy link
Contributor

@Eddie-Wang1120 Eddie-Wang1120 commented May 19, 2025

PR Category

Performance Optimization

PR Types

Improvements

Description

pcard-67164

Kernel Optimization

Masked_Fill forward kernel performance

x mask torch (us) paddle_before(us) paddle_after(us) speedup (torch) speedup (paddle_before)
(1024, 1024) (1, 1) 48.22 102.29 40.96 117.72% 249.73%
(1024, 1024) (1024, 1) 49.89 101.71 42.83 116.48% 237.47%
(108, 64, 12288) (1, 1) 1712.8 1918.77 411.67 416.06% 466.09%
(108, 64, 12288) (108, 1, 1) 1716.5 1968.26 666.03 257.72% 295.52%
(108, 64, 12288) (108, 64, 1) 1716.96 1968.53 684.87 250.70% 287.43%
(108, 64, 12288) (108, 64, 12288) 1802.64 1935.82 1039.35 173.44% 186.25%

Masked_Fill backward kernel performance

x mask torch (us) paddle_before(us) paddle_after(us) speedup (torch) speedup (paddle_before)
(1024, 1024) (1, 1) 197.34 116.55 122.38 161.25% 95.24%
(1024, 1024) (1024, 1) 195.34 114.06 113.74 171.74% 100.28%
(108, 64, 12288) (1, 1) 1897.44 1927.06 1321.87 143.54% 145.78%
(108, 64, 12288) (108, 1, 1) 1848.15 1709.19 1559.12 118.54% 109.63%
(108, 64, 12288) (108, 64, 1) 1854.34 1735.66 1587.32 116.82% 109.35%
(108, 64, 12288) (108, 64, 12288) 1967.94 1928.49 1952.57 100.79% 98.77%

Logical Optimization

SetItem forward performance

index value paddle/torch before paddle/torch after
(108,) True 0.5 6.28 0.5
(108,) random 0.5 - 0.75
(108, 64,) True 0.5 6.33 0.5
(108, 64,) random 0.5 - 0.82
(108, 64, 12288,) True 0.5 2.07 0.68
(108, 64, 12288,) random 0.5 - 1.06

SetItem backward performance

index value paddle/torch before paddle/torch after
(108,) True 0.5 3.48 0.73
(108,) random 0.5 - 0.81
(108, 64,) True 0.5 2.45 0.76
(108, 64,) random 0.5 - 0.75
(108, 64, 12288,) True 0.5 0.85 0.82
(108, 64, 12288,) random 0.5 - 1.02
@paddle-bot
Copy link

paddle-bot bot commented May 19, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label May 19, 2025
@@ -0,0 +1,114 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Copyright (c) 2025

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


mask = paddle.logical_not(mask)
out = paddle.where_(mask, x, value)
out = _C_ops.masked_fill(x, mask, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里得是masked_fill_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


namespace phi {
namespace funcs {
inline bool CanShortCutMaskFill(const DDim& input_dim, const DDim& mask_dim) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里换个体现函数实际比对功能的名字比较好

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

changeyoung98
changeyoung98 previously approved these changes May 19, 2025
inline static bool MaskedFillDispatching(const paddle::Tensor& tensor,
const paddle::Tensor& value,
std::vector<paddle::Tensor>* indices) {
if (value.numel() != 1) return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask_fill kernel 只支持value 是单值的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据python/paddle/tensor/manipulation.py中对masked_fill参数定义:
value (Scalar or 0-D Tensor): The value used to fill the target tensor.
因此只有value为单值时才可以使用masked_fill

@Eddie-Wang1120 Eddie-Wang1120 changed the title implement masked_fill_op and optimize bool setitem indexing [PHI] implement masked_fill_op and optimize bool setitem indexing May 21, 2025
Comment on lines +36 to +37
auto x_grad_dims = x_grad->dims();
auto mask_dims = mask.dims();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种一般用 const auto&,避免拷贝开销

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,后面会提交pr修改

Comment on lines +32 to +33
auto x_dims = x.dims();
auto mask_dims = mask.dims();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

@xiaoguoguo626807 xiaoguoguo626807 merged commit 15e2e47 into PaddlePaddle:develop May 22, 2025
53 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

4 participants