reduce support big tensor #71970

wanghuancoder · 2025-03-28T08:00:16Z

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

reduce相关Kernel支持 big tensor，本PR可以使728个大Tensor失败用例中726个用例测试通过。具体测试用例见：
big tensor reduce error case.txt
不能通过的case为：

paddle.sum(Tensor([3, 715827883, 2],"int32"), axis=1, keepdim=True, )
paddle.sum(Tensor([1, 2281701379, 1],"float32"), 1, )
报精度问题，本PR先合入，另外两个问题另提PR修复。

此外，还有大量API/Kernel依赖phi::funcs::ReduceKernel，应该都会有改善。

本PR涉及的改动包括：

大Tensor相关

修改大量int32的使用，主要涉及：numel、reduce_num、stride、index等。
对于index的修改，参考Torch，如果int32可以不越界，则使用int32，如果int32越界则使用int64。因为int64的++更耗时。
由于cub版本过低，不能支持超过int32的大Tensor的运算，对于大Tensor不支持cub运算。
经测试，早期对SumRawKernel使用Eigen做的大Tensor支持存在hang、700、精度等问题。改用phi::funcs::ReduceKernel。

精度相关

修改浮点数相等对比精度，从1e-8改为1e-15，否则一些运算精度与Torch比，存在错误。
修改max_grad、min_grad的运算规则，当axis=None且前向多个元素相等且为最大值时，反向梯度需要均分梯度值。

Paddle的max、min与Torch存在diff：
未修改前：

Paddle的max、min就是返回最大值、最小值，反向将out_grad给前向最大/最小值每个位置都copy一份。
Torch
- 如果axis为None，则返回最大值、最小值，反向将out_grad给前向最大/最小值每个位置都均分一份。
- 如果axis不为None，则返回每个维度的最大值、最小值，并返回每个维度其中一个最大值的索引，如果有多个最大值，只提供其中一个索引。反向将out_grad copy到提供索引的梯度位置。

Paddle不可能将max、min修改为提供索引，因为这属于不兼容升级，目前看没有这个必要。本次修改后行为如下：

如果axis为None，则返回最大值、最小值，反向将out_grad给前向最大/最小值每个位置都均分一份。
如果axis不为None，则返回每个维度的最大值、最小值，不返回索引，反向将out_grad给前向最大/最小值每个位置都copy一份。（也可以改成反向将out_grad给前向最大/最小值每个位置都均分一份，待讨论）

max_grad、min_grad，copy和均分的diff如下：

因此，均分的运算量更大，以paddle.max(Tensor([20010241024],"float32"))为例，max_grad耗时上升73.46%。

此外：

由于以上几个原因，修改了一些reduce的基础逻辑，导致编译蔓延，修改了很多Kernel代码。
由于修改精度问题，蔓延修改了组合算子逻辑。
蔓延修改了相应单测测试逻辑。

Pcard-67164

paddle-bot · 2025-03-28T08:00:20Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… reduce_support_big_tensor

paddle-ci-bot · 2025-04-21T02:57:41Z

Sorry to inform you that bddf7db's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… reduce_support_big_tensor

XiaoguangHu01

LGTM

* reduce support big tensor

* refine forrange (#72360) * refine forrange * refine forrange * reduce support big tensor (#71970) * reduce support big tensor * [PHI] Fix gridDim limit for reduce kernel (#72507) * [API] isclose support bigtensor (#72516) * isclose support bigtensor * refine * [API] isnan isinf isfinite support bigtensor (#72517) * isnan isinf isfinite support bigtensor * refine * [PHI] Fix cum kernel for big tensor (#72562) * [PHI] Preliminary fix for elementwise broadcast int32 shape overflow (#72584) * [PHI] Align linalg.solve kernel with torch (#72608) * Update strided copy kernel (#72662) * [PHI] Fix grid sample kernel for big tensor (#72628) * [PHI] Fix argsort big tensor bug (#72712) * [PHI] Fixed argsort big tensor bug * [PHI] Fixed shape mismatch problem. * [PHI] Fix contiguous kernel for big tensor (#72705) * [PHI] Fix flatten and split kernel for big tensor (#72634) * [PHI] Fix out-of-bound issue of paddle.take_along_axis (#72757) * [PHI] fix paddle.diag with big tensor (#72638) * [API] fix paddle.cross with big tensor (#72652) * [PHI] Fix paddle.where api for big tensor (#72717) * [PHI] Fix bincount kernel for big tensor (#72706) * fix bincount kernel for big tensor * use HostAlloc to alloc memory * add cpu test case * [PHI] Fix full_like kernel for big tensor (#72831) * [API] Fix int overflow and float16 support for paddle.frac (#72815) * [PHI] Align paddle.inner with torch in matmul logic (#72843) * [PHI] Fix paddle.var & paddle.std float16 overflow (#72650) * [PHI] Fix logsumexp precision problem (#72681) * [PHI] Debug for logsumexp, bug source found * [PHI] Removed GetNumBlocks func to get correct logsumexp * [PHI] Removed redundant debug VLOG * [PHI] Elegant grid bounded solution * [Accuracy diff No.55-56、76-77] Fix accuracy diff for var&std API (#72879) * [Accuracy diff No.21] Fix accuracy diff for heaviside API (#72894) --------- Co-authored-by: Shuhao Liang <50269654+lshpku@users.noreply.github.com> Co-authored-by: Qianyue He <46109954+Enigmatisms@users.noreply.github.com> Co-authored-by: Lei Ding <69283446+Dmovic@users.noreply.github.com> Co-authored-by: ggggxm <66855582+ggggxm@users.noreply.github.com> Co-authored-by: xkkkkkk23 <xiekeke@baidu.com> Co-authored-by: Zx <zhangxiao35@baidu.com> Co-authored-by: huangjiyi <43315610+huangjiyi@users.noreply.github.com> Co-authored-by: ooo oo <106524776+ooooo-create@users.noreply.github.com>

reduce support big tensor

cb412ee

wanghuancoder added 3 commits March 28, 2025 08:10

refine

9084754

refine

5d9f379

refine

7468aff

wanghuancoder requested review from JiabinYang and xiaoguoguo626807 as code owners April 1, 2025 11:28

wanghuancoder added 15 commits April 1, 2025 11:29

refine

a0082ff

refine

129e1c5

refine

7942558

refine

1b8b634

refine

8300ba3

refine

05f61a7

refine compare test case

2c92bc1

refine

e97a801

refine

a63784f

refine

58f3134

refine

9230162

refine

da80b62

refine

a363265

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0456776

… reduce_support_big_tensor

refine

bddf7db

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5af6869

… reduce_support_big_tensor

lshpku approved these changes Apr 23, 2025

View reviewed changes

XiaoguangHu01 approved these changes Apr 23, 2025

View reviewed changes

tianshuo78520a added the skip-ci: approval label Apr 23, 2025

wanghuancoder merged commit 29c3bf3 into PaddlePaddle:develop Apr 23, 2025
36 of 39 checks passed

wanghuancoder mentioned this pull request Apr 27, 2025

[API] max same with amax, min same with amin #72512

Merged

YqGe585 pushed a commit to YqGe585/Paddle that referenced this pull request May 7, 2025

reduce support big tensor (PaddlePaddle#71970)

d441ec1

* reduce support big tensor

wanghuancoder added a commit to wanghuancoder/Paddle that referenced this pull request May 27, 2025

reduce support big tensor (PaddlePaddle#71970)

b40d76b

* reduce support big tensor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reduce support big tensor #71970

reduce support big tensor #71970

Uh oh!

wanghuancoder commented Mar 28, 2025 •

edited

Loading

paddle-bot bot commented Mar 28, 2025

paddle-ci-bot bot commented Apr 21, 2025

XiaoguangHu01 left a comment

Uh oh!

Labels

4 participants

reduce support big tensor #71970

reduce support big tensor #71970

Uh oh!

Conversation

wanghuancoder commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

paddle-bot bot commented Mar 28, 2025

paddle-ci-bot bot commented Apr 21, 2025

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

wanghuancoder commented Mar 28, 2025 •

edited

Loading