Skip to content

Conversation

@fxyfxy777
Copy link
Contributor

@fxyfxy777 fxyfxy777 commented Aug 3, 2025

interpolate的error_log.log 精度与配置报错汇总(共计 2079 条)

1. cuda error 700(共 1356 条)

原因int 类型溢出,超过最大表示范围
解决方案:将相关变量替换为 size_t 类型,避免溢出。


2. CUDA error: invalid configuration argument(共 675 条)

原因:Torch 报错
解决方案:将对应配置加入 tester/api_config/torch_error_skip.txt,参考:PR #484


3. The values for attribute 'shape' do not match(共 40 条)

原因:输入 scale_h / scale_w 使用 float 精度,而 Torch 使用 double,导致乘法后误差放大并影响 int 强转后的结果,注意这里指的是函数初始化的过程paddle使用的是float,计算过程中强制转换成double只会放大误差!

示例验证代码

int64_t in_h = 10; float scale_h_f = 0.7999999999999999f; double scale_h_d = 0.7999999999999999; float result_h_f = in_h * scale_h_f; double result_h_d = in_h * scale_h_d; int out_h_f = static_cast<int>(result_h_f); int out_h_d = static_cast<int>(result_h_d); std::cout << std::fixed << std::setprecision(10); std::cout << "scale_h (float) = " << scale_h_f << ", scale_h (double) = " << scale_h_d << std::endl; std::cout << "result_h_f (float) = " << result_h_f << std::endl; std::cout << "result_h_d (double) = " << result_h_d << std::endl; std::cout << "out_h_f = " << out_h_f << ", out_h_d = " << out_h_d << std::endl;

输出结果

scale_h (float) = 0.8000000119, scale_h (double) = 0.8000000000 result_h_f (float) = 8.0000000000 result_h_d (double) = 8.0000000000 out_h_f = 8, out_h_d = 7 

结论:使用 float 会造成数值精度误差,通过 double 初始化可规避该问题。若改动范围过大,暂时处理方案为:

  • 将涉及到的错误配置添加到tester/api_config/torch_error_skip.txt

4. [accuracy error] paddle.nn.functional.interpolate(共 6 条)

5. [accuracy error] backward paddle.nn.functional.interpolate(共 2 条)

问题一致,均为精度不足导致误差过大。

插值计算公式如下:

$$ \begin{aligned} V &= d_2 \cdot \left[ h_2 \cdot (w_2 \cdot V_{000} + w_1 \cdot V_{001}) + h_1 \cdot (w_2 \cdot V_{010} + w_1 \cdot V_{011}) \right] \\ &\quad + d_1 \cdot \left[ h_2 \cdot (w_2 \cdot V_{100} + w_1 \cdot V_{101}) + h_1 \cdot (w_2 \cdot V_{110} + w_1 \cdot V_{111}) \right] \end{aligned} $$

原因

  • 大 Tensor 情况下,d1/d2, w1/w2, h1/h2 精度不足;
    -解决方案
  • Forward 阶段使用 float 精度可控制误差;
  • Backward 阶段仍出现最大绝对误差约为 5,提升至 double 后误差降至约 1.5,调整容忍度上限以放宽精度误差范围。

@paddle-bot
Copy link

paddle-bot bot commented Aug 3, 2025

Thanks for your contribution!

@cangtianhuang
Copy link
Collaborator

The values for attribute 'shape' do not match 这个报错的配置可以移动到 torch_error_skip 中,见:
https://github.com/PFCCLab/PaddleAPITest/pull/458/files#diff-5066513ca1f31563323c250b017faae2066feabc73adcda4d6473f1ef72b5bef

@fxyfxy777
Copy link
Contributor Author

  • 删除 report/big_tensor_gpu/error_config.txt
    tester/api_config/8_big_tensor/big_tensor_1_8.txt 中相关配置

这两个文件内容的删除 需要复原是么

Copy link
Collaborator

@cangtianhuang cangtianhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cangtianhuang cangtianhuang merged commit 6d8b781 into PFCCLab:main Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants