[Accuracy diff No.84-85] Fix accuracy diff for paddle.einsum API #74257
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
fix einsum_grad when contraction with broadcast
在遇到 AB 类型需要广播的时候,einsum('ij, ij -> j',Tensor(shape=[2,2]),Tensor(shape=[1,2]) 其中 i 是 AB 类型且发生了广播,在前向的 PerformContraction 阶段会 cache 在 matual 的输入结果(没有 resize 成可乘形状之前),给反向复用。
在反向调用 enisum("ij, i, -> ij") 时,shape 是 [1,2] 和 [2],后续 PerformContraction 阶段会直接使用之前的缓存结果,之后 Tensor 会被 resize 成 mul_dims,这个 mul_dims 是按照 labelshape 来计算的,直接使用 labelshape 的结果就会把 i 的 dim 认为是 1,但是其实前向 cache 的时候是 2,导致了后续错误,数据计算错误,计算梯度的 shape 是[1,2] 导致后续为了梯度形状还原到输入形状 [2,2] 直接 resize 被访问就报错了。
这里直接把反向的 labelshape 传入 EinsumKernelImpl(前向默认都是 0,不会触发后面的条件),在推导 labelshape 时对比形状是否一致,如果传入的比推出的大时(推出的为1,不一致的类型为 AO 或者 BO),就采用传入的形状