[CINN] Fixed gather_nd incorrect logic for negative inputs. #73940
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
PR Category
CINN
PR Types
Bug fixes
Description
本PR修复了
gather_nd算子的CINN lowering逻辑。原始逻辑不会对负数输入进行处理,导致计算错误。本PR:ir::Select)test_cinn_gather_nd.py性能测试结果
部分kernel的性能好像降了很多?经过NCU的分析发现:对应的kernel 实际上throughput提高了,且bottleneck环节delay大幅下降(比如lg throttle等等),但执行的SASS指令数量显著提升。
进一步对比了下面几个实现方法对应的速度:
((-int(index > 0)) & shape) + index): ~17.5 us(index + shape) % shape): ~12us,与修改前一致(甚至快一丢丢)。这个处理方法原本是非常理想的(使用 mod 操作实现 CINN 内部逻辑时,甚至不用对 shape 内可能的 min、max操作 operand 进行 recast),但现有的表达式简化逻辑暂不支持存在负数的情况((a+N)%N会被简化为a % N,在a为负数,N为常数时,这个是一个不成立的简化)。故本优化被暂时放弃,因为上述select实现的PR引入的性能下降反应在整个pass上很小(受影响kernel的时间占比小于1%)。Pcard-89620