Skip to content

Conversation

@hushenwei2000
Copy link
Contributor

@hushenwei2000 hushenwei2000 commented Jul 21, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

现有的 fused_bias_dropout_residual_layer_norm 算子在某些参数条件下出现反向结果全为 0 的情况。

cols_ == 1024 bias == nullptr dropout_rate == 0 Result
Yes Yes Yes FAIL
Yes Yes No PASS
Yes No Yes PASS
Yes No No PASS
No Yes Yes PASS
No Yes No PASS
No No Yes PASS
No No No PASS

在源码中发现 FAIL 的 case 是调用了快速 kernel ln_bwd_fast_kernel_driver

// FILE: paddle/phi/kernels/fusion/gpu/fused_dropout_helper.h if (this->cols_ == 1024 && d_bias == nullptr && d_scale != nullptr && d_layernorm_bias != nullptr && sizeof(T) <= 4) { can_call_1024_kernel = true; } 

而在快速 kernel 中存在这行代码:

// FILE: paddle/phi/kernels/funcs/layer_norm_impl.cu.h dout[it][jt] = x[it][jt] * static_cast<T>(mask_vec[it][jt]) * factor; 

,当 dropout_rate == 0 时,mask_vec 为全 0 ,导致计算出的结果 dout 为全 0。

修复过程参考了非调用快速 kernel 的情况,有对 dropout 区分是否要乘以 mask_vec,即没有 Dropout 的时候就不乘以 mask_vec

// FILE: paddle/phi/kernels/fusion/gpu/fused_residual_dropout_bias.h if (HasDropout) { dx_vec[i] = out_vec[i] * static_cast<T>(mask_vec[i]) * factor; } else { dx_vec[i] = out_vec[i] * factor; } 

,因此参照此处修复。

修复后上述 8 种情况都能测试通过。此外 PaddleAPITest 的内的失败也都可通过,如下。

2025-07-21 16:10:47.750734 Worker PID: 91054, Assigned GPU ID: 2 2025-07-21 16:10:47.853978 Worker PID: 91055, Assigned GPU ID: 3 2025-07-21 16:10:47.944813 Worker PID: 91056, Assigned GPU ID: 0 2025-07-21 16:10:47.853036 Worker PID: 91057, Assigned GPU ID: 7 2025-07-21 16:10:47.857670 Worker PID: 91058, Assigned GPU ID: 5 2025-07-21 16:10:47.624630 Worker PID: 91059, Assigned GPU ID: 1 2025-07-21 16:10:47.626300 GPU 1 91059 test begin: paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm(Tensor([8, 128, 1024],"float32"), Tensor([8, 128, 1024],"float32"), None, Tensor([1024],"float32"), Tensor([1024],"float32"), 0.0, 1e-05, ) W0721 16:11:21.658066 91059 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.0, Runtime API Version: 11.8 [Pass] paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm(Tensor([8, 128, 1024],"float32"), Tensor([8, 128, 1024],"float32"), None, Tensor([1024],"float32"), Tensor([1024],"float32"), 0.0, 1e-05, ) 2025-07-21 16:10:47.755052 Worker PID: 91060, Assigned GPU ID: 6 2025-07-21 16:10:47.723834 Worker PID: 91061, Assigned GPU ID: 4 W0721 16:11:23.500622 91054 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.0, Runtime API Version: 11.8 

Pcard-67164

@paddle-bot
Copy link

paddle-bot bot commented Jul 21, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@CLAassistant
Copy link

CLAassistant commented Jul 21, 2025

CLA assistant check
All committers have signed the CLA.

@hushenwei2000
Copy link
Contributor Author

/re-run all-failed

1 similar comment
@hushenwei2000
Copy link
Contributor Author

/re-run all-failed

@wanghuancoder wanghuancoder merged commit 28be650 into PaddlePaddle:develop Jul 31, 2025
92 of 94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants