-
Couldn't load subscription status.
- Fork 5.9k
[0-size Tensor Job2 No.21-23] Add 0-size Tensor support for send_u_recv #73806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0-size Tensor Job2 No.21-23] Add 0-size Tensor support for send_u_recv #73806
Conversation
| 你的PR提交成功,感谢你对开源项目的贡献! |
|
ningzhengsheng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| if (out_size_data[0] <= 0) { | ||
| out->Resize(x.dims()); | ||
| } else { | ||
| out->Resize(common::make_ddim(out_size_data)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out_size 出现负数是输入的问题还是shape推导的问题?不应在kernel层面重新处理shape。应该在infermeta的时候就检查好或者保证推导正确,不应该到kernel层面dim中还出现负数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前正常非0-size的逻辑也是在kernel层面重新处理了shape,所有为了统一,暂时也按照该方式处理
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@ ## develop #73806 +/- ## ========================================== Coverage ? 95.00% ========================================== Files ? 8 Lines ? 60 Branches ? 0 ========================================== Hits ? 57 Misses ? 3 Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| dst_index_dims.size())); | ||
| } | ||
| | ||
| PADDLE_ENFORCE_EQ(src_index_dims[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议不为0时保留检查
if (src_index_dims[0] != 0) { common::errors::InvalidArgument( PADDLE_ENFORCE_EQ( "Src_index and Dst_index should have the same shape.")); src_index_dims[0], dst_index_dims[0], common::errors::InvalidArgument( "Src_index and Dst_index should have the same shape.")); } There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
| /re-run all-failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
问题与修复过程:

存在问题1:
存在问题2:

存在问题3:
描述:按照常规在Kernel中增加0-size判断返回dev_ctx.template Alloc(out)时,出现 out 存在负数维度报错,以及paddle的out维度与torch的out维度不相同。
修复:通过阅读代码可以发现,原代码在Kernle的后面有对out的size进行设置。所以我们在0-size条件下,Kernle的out分配空间前,将 out 的size进行重新设置。
存在问题4:

描述:当 src_index.dims() 与 dst_index.dims() 不相等时,torch不会抛出异常,但paddle会主动抛出异常。
修复:对SendURecvInferMeta、SendUERecvInferMeta、SendUVInferMeta中src_index.dims() 与 dst_index.dims() 不相等时抛出的异常增加判断,当src_index_dims[0]!=0时进行非法输入检查。
修复结果:


--accuracy=True 与 --paddle_only=True测试,paddle error 和 accuracy error问题全部修复,只存在numpy与torch error。
Pcard-67164