Skip to content

Conversation

@fxyfxy777
Copy link
Contributor

@fxyfxy777 fxyfxy777 commented Aug 3, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

interpolate的error_log.log 精度与配置报错汇总(共计 2079 条)

1. cuda error 700(共 1356 条)

原因int 类型溢出,超过最大表示范围
解决方案:将相关变量替换为 size_t 类型,避免溢出。


2. CUDA error: invalid configuration argument(共 675 条)

原因:Torch 报错
解决方案:将对应配置加入 tester/api_config/torch_error_skip.txt,参考:PR #484


3. The values for attribute 'shape' do not match(共 40 条)

原因:输入 scale_h / scale_w 使用 float 精度,而 Torch 使用 double,导致乘法后误差放大并影响 int 强转后的结果,注意这里指的是函数初始化的过程paddle使用的是float,计算过程中强制转换成double只会放大误差!

示例验证代码

int64_t in_h = 10; float scale_h_f = 0.7999999999999999f; double scale_h_d = 0.7999999999999999; float result_h_f = in_h * scale_h_f; double result_h_d = in_h * scale_h_d; int out_h_f = static_cast<int>(result_h_f); int out_h_d = static_cast<int>(result_h_d); std::cout << std::fixed << std::setprecision(10); std::cout << "scale_h (float) = " << scale_h_f << ", scale_h (double) = " << scale_h_d << std::endl; std::cout << "result_h_f (float) = " << result_h_f << std::endl; std::cout << "result_h_d (double) = " << result_h_d << std::endl; std::cout << "out_h_f = " << out_h_f << ", out_h_d = " << out_h_d << std::endl;

输出结果

scale_h (float) = 0.8000000119, scale_h (double) = 0.8000000000 result_h_f (float) = 8.0000000000 result_h_d (double) = 8.0000000000 out_h_f = 8, out_h_d = 7 

结论:使用 float 会造成数值精度误差,通过 double 初始化可规避该问题。若改动范围过大,暂时处理方案为:


4. [accuracy error] paddle.nn.functional.interpolate(共 6 条)

5. [accuracy error] backward paddle.nn.functional.interpolate(共 2 条)

问题一致,均为精度不足导致误差过大。

插值计算公式如下:

$$ \begin{aligned} V &= d_2 \cdot \left[ h_2 \cdot (w_2 \cdot V_{000} + w_1 \cdot V_{001}) + h_1 \cdot (w_2 \cdot V_{010} + w_1 \cdot V_{011}) \right] \\ &\quad + d_1 \cdot \left[ h_2 \cdot (w_2 \cdot V_{100} + w_1 \cdot V_{101}) + h_1 \cdot (w_2 \cdot V_{110} + w_1 \cdot V_{111}) \right] \end{aligned} $$

原因

  • 大 Tensor 情况下,d1/d2, w1/w2, h1/h2 精度不足;
    -解决方案
  • Forward 阶段使用 float 精度可控制误差;
  • Backward 阶段仍出现最大绝对误差约为 5,提升至 double 后误差降至约 1.5,调整容忍度上限以放宽精度误差范围。
  • 参见fix_interpolate PFCCLab/PaddleAPITest#490

Pcard-92269

@paddle-bot
Copy link

paddle-bot bot commented Aug 3, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@fxyfxy777
Copy link
Contributor Author

/re-run all-failed

@fxyfxy777
Copy link
Contributor Author

/re-run all-failed

1 similar comment
@fxyfxy777
Copy link
Contributor Author

/re-run all-failed

@fxyfxy777
Copy link
Contributor Author

/re-run all-failed

Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lshpku lshpku merged commit a145db3 into PaddlePaddle:develop Aug 5, 2025
71 of 72 checks passed
@fxyfxy777 fxyfxy777 deleted the fix_interpolate branch September 9, 2025 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants