[LV] Bundle (partial) reductions with a mul of a constant #162503

SamTebbs33 · 2025-10-08T16:06:39Z

A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from reduce.add(mul(ext, const)) to reduce.add(mul(ext, ext(const))) as long as it is safe to extend the constant.

This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations.

This is a stacked PR:

llvmbot · 2025-10-08T16:07:13Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Sam Tebbs (SamTebbs33)

Changes

A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from reduce.add(mul(ext, const)) to reduce.add(mul(ext, ext(const))) as long as it is safe to extend the constant.

This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations.

Full diff: https://github.com/llvm/llvm-project/pull/162503.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+26)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+266)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp index 24426c1a53835..4bf2eb3765080 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp +++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp @@ -3597,6 +3597,32 @@ tryToMatchAndCreateMulAccumulateReduction(VPReductionRecipe *Red, dyn_cast_if_present<VPWidenCastRecipe>(B->getDefiningRecipe()); auto *Mul = cast<VPWidenRecipe>(VecOp->getDefiningRecipe()); + // Match reduce.add(mul(ext, const)) and convert it to + // reduce.add(mul(ext, ext(const))) + if (RecipeA && !RecipeB && B->isLiveIn()) { + Type *NarrowTy = Ctx.Types.inferScalarType(RecipeA->getOperand(0)); + Instruction::CastOps ExtOpc = RecipeA->getOpcode(); + auto *Const = dyn_cast<ConstantInt>(B->getLiveInIRValue()); + if (Const && + llvm::canConstantBeExtended( + Const, NarrowTy, TTI::getPartialReductionExtendKind(ExtOpc))) { + // The truncate ensures that the type of each extended operand is the + // same, and it's been proven that the constant can be extended from + // NarrowTy safely. Necessary since RecipeA's extended operand would be + // e.g. an i8, while the const will likely be an i32. This will be + // elided by later optimisations. + auto *Trunc = + new VPWidenCastRecipe(Instruction::CastOps::Trunc, B, NarrowTy); + Trunc->insertBefore(*RecipeA->getParent(), + std::next(RecipeA->getIterator())); + + Type *WideTy = Ctx.Types.inferScalarType(RecipeA); + RecipeB = new VPWidenCastRecipe(ExtOpc, Trunc, WideTy); + RecipeB->insertAfter(Trunc); + Mul->setOperand(1, RecipeB); + } + } + // Match reduce.add/sub(mul(ext, ext)). if (RecipeA && RecipeB && match(RecipeA, m_ZExtOrSExt(m_VPValue())) && match(RecipeB, m_ZExtOrSExt(m_VPValue())) && diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll index 06b044872c217..ddae9007b7620 100644 --- a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll +++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll @@ -800,3 +800,269 @@ exit: %r.0.lcssa = phi i64 [ %rdx.next, %loop ] ret i64 %r.0.lcssa } + +define i32 @print_mulacc_extended_const(ptr %start, ptr %end) { +; CHECK-LABEL: 'print_mulacc_extended_const' +; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' { +; CHECK-NEXT: Live-in vp<%0> = VF +; CHECK-NEXT: Live-in vp<%1> = VF * UF +; CHECK-NEXT: Live-in vp<%2> = vector-trip-count +; CHECK-NEXT: vp<%3> = original trip-count +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<entry>: +; CHECK-NEXT: EMIT vp<%3> = EXPAND SCEV (1 + (-1 * (ptrtoint ptr %start to i64)) + (ptrtoint ptr %end to i64)) +; CHECK-NEXT: Successor(s): scalar.ph, vector.ph +; CHECK-EMPTY: +; CHECK-NEXT: vector.ph: +; CHECK-NEXT: vp<%4> = DERIVED-IV ir<%start> + vp<%2> * ir<1> +; CHECK-NEXT: EMIT vp<%5> = reduction-start-vector ir<0>, ir<0>, ir<1> +; CHECK-NEXT: Successor(s): vector loop +; CHECK-EMPTY: +; CHECK-NEXT: <x1> vector loop: { +; CHECK-NEXT: vector.body: +; CHECK-NEXT: EMIT vp<%6> = CANONICAL-INDUCTION ir<0>, vp<%index.next> +; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi vp<%5>, vp<%9> +; CHECK-NEXT: vp<%7> = SCALAR-STEPS vp<%6>, ir<1>, vp<%0> +; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%7> +; CHECK-NEXT: vp<%8> = vector-pointer vp<%next.gep> +; CHECK-NEXT: WIDEN ir<%l> = load vp<%8> +; CHECK-NEXT: EXPRESSION vp<%9> = ir<%red> + reduce.add (mul (ir<%l> zext to i32), (ir<63> zext to i32)) +; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%6>, vp<%1> +; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%2> +; CHECK-NEXT: No successors +; CHECK-NEXT: } +; CHECK-NEXT: Successor(s): middle.block +; CHECK-EMPTY: +; CHECK-NEXT: middle.block: +; CHECK-NEXT: EMIT vp<%11> = compute-reduction-result ir<%red>, vp<%9> +; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq vp<%3>, vp<%2> +; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n> +; CHECK-NEXT: Successor(s): ir-bb<exit>, scalar.ph +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<exit>: +; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%11> from middle.block) +; CHECK-NEXT: No successors +; CHECK-EMPTY: +; CHECK-NEXT: scalar.ph: +; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%4>, middle.block ], [ ir<%start>, ir-bb<entry> ] +; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%11>, middle.block ], [ ir<0>, ir-bb<entry> ] +; CHECK-NEXT: Successor(s): ir-bb<loop> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<loop>: +; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from scalar.ph) +; CHECK-NEXT: IR %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from scalar.ph) +; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1 +; CHECK-NEXT: IR %l.ext = zext i8 %l to i32 +; CHECK-NEXT: IR %mul = mul i32 %l.ext, 63 +; CHECK-NEXT: IR %red.next = add i32 %red, %mul +; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 +; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end +; CHECK-NEXT: No successors +; CHECK-NEXT: } +; CHECK: VPlan 'Final VPlan for VF={4},UF={1}' { +; CHECK-NEXT: Live-in ir<%1> = original trip-count +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<entry>: +; CHECK-NEXT: IR %start2 = ptrtoint ptr %start to i64 +; CHECK-NEXT: IR %end1 = ptrtoint ptr %end to i64 +; CHECK-NEXT: IR %0 = add i64 %end1, 1 +; CHECK-NEXT: IR %1 = sub i64 %0, %start2 +; CHECK-NEXT: EMIT vp<%min.iters.check> = icmp ult ir<%1>, ir<4> +; CHECK-NEXT: EMIT branch-on-cond vp<%min.iters.check> +; CHECK-NEXT: Successor(s): ir-bb<scalar.ph>, vector.ph +; CHECK-EMPTY: +; CHECK-NEXT: vector.ph: +; CHECK-NEXT: EMIT vp<%n.mod.vf> = urem ir<%1>, ir<4> +; CHECK-NEXT: EMIT vp<%n.vec> = sub ir<%1>, vp<%n.mod.vf> +; CHECK-NEXT: vp<%3> = DERIVED-IV ir<%start> + vp<%n.vec> * ir<1> +; CHECK-NEXT: Successor(s): vector.body +; CHECK-EMPTY: +; CHECK-NEXT: vector.body: +; CHECK-NEXT: EMIT-SCALAR vp<%index> = phi [ ir<0>, vector.ph ], [ vp<%index.next>, vector.body ] +; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi ir<0>, ir<%red.next> +; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%index> +; CHECK-NEXT: WIDEN ir<%l> = load vp<%next.gep> +; CHECK-NEXT: WIDEN-CAST ir<%l.ext> = zext ir<%l> to i32 +; CHECK-NEXT: WIDEN ir<%mul> = mul ir<%l.ext>, ir<63> +; CHECK-NEXT: REDUCE ir<%red.next> = ir<%red> + reduce.add (ir<%mul>) +; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%index>, ir<4> +; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%n.vec> +; CHECK-NEXT: Successor(s): middle.block, vector.body +; CHECK-EMPTY: +; CHECK-NEXT: middle.block: +; CHECK-NEXT: EMIT vp<%5> = compute-reduction-result ir<%red>, ir<%red.next> +; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq ir<%1>, vp<%n.vec> +; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n> +; CHECK-NEXT: Successor(s): ir-bb<exit>, ir-bb<scalar.ph> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<exit>: +; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%5> from middle.block) +; CHECK-NEXT: No successors +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<scalar.ph>: +; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%3>, middle.block ], [ ir<%start>, ir-bb<entry> ] +; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%5>, middle.block ], [ ir<0>, ir-bb<entry> ] +; CHECK-NEXT: Successor(s): ir-bb<loop> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<loop>: +; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %scalar.ph ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from ir-bb<scalar.ph>) +; CHECK-NEXT: IR %red = phi i32 [ 0, %scalar.ph ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from ir-bb<scalar.ph>) +; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1 +; CHECK-NEXT: IR %l.ext = zext i8 %l to i32 +; CHECK-NEXT: IR %mul = mul i32 %l.ext, 63 +; CHECK-NEXT: IR %red.next = add i32 %red, %mul +; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 +; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end +; CHECK-NEXT: No successors +; CHECK-NEXT: } +entry: + br label %loop + +loop: + %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] + %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] + %l = load i8, ptr %ptr.iv, align 1 + %l.ext = zext i8 %l to i32 + %mul = mul i32 %l.ext, 63 + %red.next = add i32 %red, %mul + %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 + %ec = icmp eq ptr %ptr.iv, %end + br i1 %ec, label %exit, label %loop + +exit: + ret i32 %red.next +} + +; Constants >= 128 cannot be treated as sign-extended, so the expression shouldn't extend 128 +define i32 @print_mulacc_not_extended_const(ptr %start, ptr %end) { +; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' { +; CHECK-NEXT: Live-in vp<%0> = VF +; CHECK-NEXT: Live-in vp<%1> = VF * UF +; CHECK-NEXT: Live-in vp<%2> = vector-trip-count +; CHECK-NEXT: vp<%3> = original trip-count +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<entry>: +; CHECK-NEXT: EMIT vp<%3> = EXPAND SCEV (1 + (-1 * (ptrtoint ptr %start to i64)) + (ptrtoint ptr %end to i64)) +; CHECK-NEXT: Successor(s): scalar.ph, vector.ph +; CHECK-EMPTY: +; CHECK-NEXT: vector.ph: +; CHECK-NEXT: vp<%4> = DERIVED-IV ir<%start> + vp<%2> * ir<1> +; CHECK-NEXT: EMIT vp<%5> = reduction-start-vector ir<0>, ir<0>, ir<1> +; CHECK-NEXT: Successor(s): vector loop +; CHECK-EMPTY: +; CHECK-NEXT: <x1> vector loop: { +; CHECK-NEXT: vector.body: +; CHECK-NEXT: EMIT vp<%6> = CANONICAL-INDUCTION ir<0>, vp<%index.next> +; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi vp<%5>, vp<%9> +; CHECK-NEXT: vp<%7> = SCALAR-STEPS vp<%6>, ir<1>, vp<%0> +; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%7> +; CHECK-NEXT: vp<%8> = vector-pointer vp<%next.gep> +; CHECK-NEXT: WIDEN ir<%l> = load vp<%8> +; CHECK-NEXT: WIDEN-CAST ir<%l.ext> = sext ir<%l> to i32 +; CHECK-NEXT: EXPRESSION vp<%9> = ir<%red> + reduce.add (mul ir<%l.ext>, ir<128>) +; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%6>, vp<%1> +; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%2> +; CHECK-NEXT: No successors +; CHECK-NEXT: } +; CHECK-NEXT: Successor(s): middle.block +; CHECK-EMPTY: +; CHECK-NEXT: middle.block: +; CHECK-NEXT: EMIT vp<%11> = compute-reduction-result ir<%red>, vp<%9> +; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq vp<%3>, vp<%2> +; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n> +; CHECK-NEXT: Successor(s): ir-bb<exit>, scalar.ph +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<exit>: +; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%11> from middle.block) +; CHECK-NEXT: No successors +; CHECK-EMPTY: +; CHECK-NEXT: scalar.ph: +; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%4>, middle.block ], [ ir<%start>, ir-bb<entry> ] +; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%11>, middle.block ], [ ir<0>, ir-bb<entry> ] +; CHECK-NEXT: Successor(s): ir-bb<loop> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<loop>: +; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from scalar.ph) +; CHECK-NEXT: IR %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from scalar.ph) +; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1 +; CHECK-NEXT: IR %l.ext = sext i8 %l to i32 +; CHECK-NEXT: IR %mul = mul i32 %l.ext, 128 +; CHECK-NEXT: IR %red.next = add i32 %red, %mul +; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 +; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end +; CHECK-NEXT: No successors +; CHECK-NEXT: } +; CHECK: VPlan 'Final VPlan for VF={4},UF={1}' { +; CHECK-NEXT: Live-in ir<%1> = original trip-count +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<entry>: +; CHECK-NEXT: IR %start2 = ptrtoint ptr %start to i64 +; CHECK-NEXT: IR %end1 = ptrtoint ptr %end to i64 +; CHECK-NEXT: IR %0 = add i64 %end1, 1 +; CHECK-NEXT: IR %1 = sub i64 %0, %start2 +; CHECK-NEXT: EMIT vp<%min.iters.check> = icmp ult ir<%1>, ir<4> +; CHECK-NEXT: EMIT branch-on-cond vp<%min.iters.check> +; CHECK-NEXT: Successor(s): ir-bb<scalar.ph>, vector.ph +; CHECK-EMPTY: +; CHECK-NEXT: vector.ph: +; CHECK-NEXT: EMIT vp<%n.mod.vf> = urem ir<%1>, ir<4> +; CHECK-NEXT: EMIT vp<%n.vec> = sub ir<%1>, vp<%n.mod.vf> +; CHECK-NEXT: vp<%3> = DERIVED-IV ir<%start> + vp<%n.vec> * ir<1> +; CHECK-NEXT: Successor(s): vector.body +; CHECK-EMPTY: +; CHECK-NEXT: vector.body: +; CHECK-NEXT: EMIT-SCALAR vp<%index> = phi [ ir<0>, vector.ph ], [ vp<%index.next>, vector.body ] +; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi ir<0>, ir<%red.next> +; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%index> +; CHECK-NEXT: WIDEN ir<%l> = load vp<%next.gep> +; CHECK-NEXT: WIDEN-CAST ir<%l.ext> = sext ir<%l> to i32 +; CHECK-NEXT: WIDEN ir<%mul> = mul ir<%l.ext>, ir<128> +; CHECK-NEXT: REDUCE ir<%red.next> = ir<%red> + reduce.add (ir<%mul>) +; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%index>, ir<4> +; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%n.vec> +; CHECK-NEXT: Successor(s): middle.block, vector.body +; CHECK-EMPTY: +; CHECK-NEXT: middle.block: +; CHECK-NEXT: EMIT vp<%5> = compute-reduction-result ir<%red>, ir<%red.next> +; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq ir<%1>, vp<%n.vec> +; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n> +; CHECK-NEXT: Successor(s): ir-bb<exit>, ir-bb<scalar.ph> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<exit>: +; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%5> from middle.block) +; CHECK-NEXT: No successors +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<scalar.ph>: +; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%3>, middle.block ], [ ir<%start>, ir-bb<entry> ] +; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%5>, middle.block ], [ ir<0>, ir-bb<entry> ] +; CHECK-NEXT: Successor(s): ir-bb<loop> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<loop>: +; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %scalar.ph ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from ir-bb<scalar.ph>) +; CHECK-NEXT: IR %red = phi i32 [ 0, %scalar.ph ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from ir-bb<scalar.ph>) +; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1 +; CHECK-NEXT: IR %l.ext = sext i8 %l to i32 +; CHECK-NEXT: IR %mul = mul i32 %l.ext, 128 +; CHECK-NEXT: IR %red.next = add i32 %red, %mul +; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 +; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end +; CHECK-NEXT: No successors +; CHECK-NEXT: } +entry: + br label %loop + +loop: + %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] + %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] + %l = load i8, ptr %ptr.iv, align 1 + %l.ext = sext i8 %l to i32 + %mul = mul i32 %l.ext, 128 + %red.next = add i32 %red, %mul + %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 + %ec = icmp eq ptr %ptr.iv, %end + br i1 %ec, label %exit, label %loop + +exit: + %red.next.lcssa = phi i32 [ %red.next, %loop ] + ret i32 %red.next.lcssa +}

huntergr-arm

I may have missed it, but I don't see anything in the cost model work related to the trunc/extend feature, just the addition of asserts.

Could you please make that an independent PR?

SamTebbs33 · 2025-10-20T09:02:16Z

I may have missed it, but I don't see anything in the cost model work related to the trunc/extend feature, just the addition of asserts.

Could you please make that an independent PR?

In #147302 I was asked to make IsMulAccValidAndClampRange assert that the partial reduction cost is <= the base cost of the add + mul + extends, but had to change that to a return since the add(mul(ext, const)) case was failing the assertion since it follows the same code path. This PR adds support for that add(mul(ext, const)) case so it makes sense to return to the assertion in this PR. The PR is still small as it is anyway, and I was asked to re-add the assertion in this PR (#147302 (comment)).

sdesmalen-arm · 2025-10-20T13:48:43Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ Opcode, SrcTy, nullptr, RedTy, VF, ExtKind,
+ llvm::TargetTransformInfo::PR_None, std::nullopt,
+ Ctx.CostKind);
+ assert(PartialReductionCost <= BaseCost &&


It should only assert that PartialReductionCost.isValid(), because no code ensures that that cost will be lower in practice to the BaseCost.

sdesmalen-arm · 2025-10-20T13:56:12Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp


+ // Match reduce.add(mul(ext, const)) and convert it to
+ // reduce.add(mul(ext, ext(const)))
+ if (RecipeA && !RecipeB && B->isLiveIn()) {


It would be nice if this could also work for the case handled on loop 3657 (zext(mul(zext(a), zext(b))) where b is a constant)

Done, thanks for the idea.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

sdesmalen-arm · 2025-10-23T10:36:43Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ Opcode, SrcTy, nullptr, RedTy, VF, ExtKind,
+ llvm::TargetTransformInfo::PR_None, std::nullopt,
+ Ctx.CostKind);
+ assert(PartialReductionCost.isValid() &&


A little bit to my surprise, it seems none of the tests change by doing this.
My suggestion would be to keep the code exactly the same as it was, but just add the assert that the partial reduction cost is valid. And possibly to do that in a separate NFCI PR from this one, because they're functionally unrelated.

Done, I'll raise the new PR soon 👍 I had the assertion in this PR since it had to be removed from the other PR because of the "multiply by constant" case, which this PR introduces support for.

sdesmalen-arm · 2025-10-23T10:37:16Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp


- return MulAccCost.isValid() &&
- MulAccCost < ExtCost + MulCost + RedCost;
+ if (IsPartialReduction) {


Same as my comment above, this is now a big change and none of the tests change. My suggestion is to keep the code the same as it was, but to create a new PR to add the asserts that partial reduction cost is valid.

sdesmalen-arm · 2025-10-23T10:38:24Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ auto ExtendAndReplaceConstantOp = [&Ctx](VPWidenCastRecipe *ExtA,
+ VPWidenCastRecipe *&ExtB,
+ VPValue *&ValB, VPWidenRecipe *Mul) {
+ if (ExtA && !ExtB && ValB->isLiveIn()) {


nit: maybe bail out early here and for the if (Const && llvm::canConstantBeExtended(..)) case, rather than having a multi-nested if-statement.

nit: Can you bail out early on line 3654 as well?

Ah sorry, missed the here in your comment. Done!

sdesmalen-arm · 2025-10-23T10:40:03Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ VPValue *&ValB, VPWidenRecipe *Mul) {
+ if (ExtA && !ExtB && ValB->isLiveIn()) {
+ Type *NarrowTy = Ctx.Types.inferScalarType(ExtA->getOperand(0));
+ Type *WideTy = Ctx.Types.inferScalarType(ExtA);


nit: move closer to use.

sdesmalen-arm · 2025-10-23T10:53:53Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ auto *Trunc =
+ new VPWidenCastRecipe(Instruction::CastOps::Trunc, ValB, NarrowTy);
+ Trunc->insertBefore(*ExtA->getParent(), std::next(ExtA->getIterator()));
+
+ VPWidenCastRecipe *NewCast =
+ new VPWidenCastRecipe(ExtOpc, Trunc, WideTy);
+ NewCast->insertAfter(Trunc);
+ ExtB = NewCast;
+ ValB = NewCast;
+ Mul->setOperand(1, NewCast);


The insertion point can be simplified to be the Mul, because that's the point where we care about the extended input. You can also use VPBuilder, to avoid having to create the recipe and then insert it, i.e.

Suggested change

auto *Trunc =

new VPWidenCastRecipe(Instruction::CastOps::Trunc, ValB, NarrowTy);

Trunc->insertBefore(*ExtA->getParent(), std::next(ExtA->getIterator()));

VPWidenCastRecipe *NewCast =

new VPWidenCastRecipe(ExtOpc, Trunc, WideTy);

NewCast->insertAfter(Trunc);

ExtB = NewCast;

ValB = NewCast;

Mul->setOperand(1, NewCast);

VPBuilder Builder(Mul);

auto *Trunc =

Builder.createWidenCast(Instruction::CastOps::Trunc, ValB, NarrowTy);

Type *WideTy = Ctx.Types.inferScalarType(ExtA);

ValB = ExtB = Builder.createWidenCast(ExtOpc, Trunc, WideTy);

Mul->setOperand(1, ExtB);

This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. #147026 2. #147255 3. #156976 4. #160154 5. -> #147302 6. #162503 7. #147513

fhahn · 2025-10-23T11:16:16Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ // The truncate ensures that the type of each extended operand is the
+ // same, and it's been proven that the constant can be extended from
+ // NarrowTy safely. Necessary since ExtA's extended operand would be
+ // e.g. an i8, while the const will likely be an i32. This will be
+ // elided by later optimisations.


Could we avoid introcuding the explicit cast recipe by just creating a new extended live-in? And then use pattern matching to get the extend from either side of the multiply if needed?

There currently are quite a bit of changes in the diff that make it a bit difficult to see how that directly relates to supporting constants, but that may help to simplify things

Could we avoid introcuding the explicit cast recipe by just creating a new extended live-in?

Is there any particular downside to creating the explicit cast recipe? (I'm asking in case handling this in pattern matching and VPExpression recipes would be tricky for some reason)

There currently are quite a bit of changes in the diff that make it a bit difficult to see how that directly relates to supporting constants

FWIW, a lot of the changes in this PR currently are unrelated to supporting constants, so I've asked those to be removed from this PR (#162503 (comment) and #162503 (comment))

It would be more direct to what we generate down the line, but the current version looks fine after the recent cleanups, would still be interesting to see if it would help to simplify things down the line.

…#147302) This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. llvm/llvm-project#147026 2. llvm/llvm-project#147255 3. llvm/llvm-project#156976 4. llvm/llvm-project#160154 5. -> llvm/llvm-project#147302 6. llvm/llvm-project#162503 7. llvm/llvm-project#147513

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

sdesmalen-arm · 2025-10-23T14:14:16Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ // Convert reduce.add(ext(mul(ext, const))) to reduce.add(ext(mul(ext,
+ // ext(const))))


nit: unfortunate formatting

Suggested change

// Convert reduce.add(ext(mul(ext, const))) to reduce.add(ext(mul(ext,

// ext(const))))

// reduce.add(ext(mul(ext, const)))

// -> reduce.add(ext(mul(ext, ext(trunc(const)))))

sdesmalen-arm · 2025-10-23T14:14:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ auto ExtendAndReplaceConstantOp = [&Ctx](VPWidenCastRecipe *ExtA,
+ VPWidenCastRecipe *&ExtB,
+ VPValue *&ValB, VPWidenRecipe *Mul) {
+ if (ExtA && !ExtB && ValB->isLiveIn()) {


nit: Can you bail out early on line 3654 as well?

sdesmalen-arm · 2025-10-23T14:16:30Z

llvm/test/Transforms/LoopVectorize/reduction-inloop.ll

+ %gep.b = getelementptr i8, ptr %b, i64 %iv
+ %load.b = load i8, ptr %gep.b, align 1
+ %ext.b = zext i8 %load.b to i16


these are unused (and so is argument ptr %b)

sdesmalen-arm · 2025-10-24T13:42:49Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ // All extend recipes must have same opcode or A == B
+ // which can be transformed to reduce.add(zext(mul(sext(A), sext(B)))).


I know you've just copied this from above, but this comment is not accurate.

The cases it tries to handle are:

reduce.add(zext(mul(zext(a), zext(b)))) -> reduce.add(mul(wider_zext(a), wider_zext(b))) reduce.add(sext(mul(sext(a), sext(b)))) -> reduce.add(mul(wider_sext(a), wider_sext(b)))

and the other case (and reason for checking Ext0 == Ext1) is because that would mean the mul is non-negative which means that the final zero-extend can be folded away, i.e.

reduce.add(zext(mul(sext(a), sext(a)))) // result of mul is nneg -> reduce.add(mul(wider_sext(a), wider_sext(a)))

Done, thank you. I've kept the comment quite generic since it doesn't check exactly which opcodes are sext and zext, but I hope the comment is clearer now.

(I didn't mean to accept the PR yet)

This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. llvm#147026 2. llvm#147255 3. llvm#156976 4. llvm#160154 5. -> llvm#147302 6. llvm#162503 7. llvm#147513

sdesmalen-arm

LGTM with nit addressed. This approach seems functionally sound and I believe that any suggestions for a different approach should be able to be handled in a follow-up. It would just be nice to get this improvement in.

sdesmalen-arm · 2025-10-28T14:25:39Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ // -> reduce.add(mul(wider_ext(A), wider_ext(B)))
+ // The inner extends must either have the same opcode as the outer extend or
+ // be the same, in which case the multiply can never result in a negative
+ // value and the outer extend opcode doesn't matter


nit:

Suggested change

// value and the outer extend opcode doesn't matter

// value and the outer extend can be folded away by doing wider extends for the operands of the mul.

Done, thank you!

A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations.

This reverts commit 0920112.

…ting

fhahn · 2025-10-28T14:50:01Z

llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll

+; CHECK-NEXT: ir-bb<exit>:
+; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%11> from middle.block)
+; CHECK-NEXT: No successors
+; CHECK-EMPTY:
+; CHECK-NEXT: scalar.ph:
+; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%4>, middle.block ], [ ir<%start>, ir-bb<entry> ]
+; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%11>, middle.block ], [ ir<0>, ir-bb<entry> ]
+; CHECK-NEXT: Successor(s): ir-bb<loop>
+; CHECK-EMPTY:
+; CHECK-NEXT: ir-bb<loop>:
+; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from scalar.ph)
+; CHECK-NEXT: IR %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from scalar.ph)
+; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1
+; CHECK-NEXT: IR %l.ext = sext i8 %l to i32
+; CHECK-NEXT: IR %mul = mul i32 %l.ext, 128
+; CHECK-NEXT: IR %red.next = add i32 %red, %mul
+; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1
+; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end
+; CHECK-NEXT: No successors
+; CHECK-NEXT: }


I think you can drop those, as the main thing is checking forming the reductions

fhahn · 2025-10-28T14:51:32Z

llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll

+; CHECK: VPlan 'Final VPlan for VF={4},UF={1}' {
+; CHECK-NEXT: Live-in ir<%1> = original trip-count
+; CHECK-EMPTY:
+; CHECK-NEXT: ir-bb<entry>:
+; CHECK-NEXT: IR %start2 = ptrtoint ptr %start to i64
+; CHECK-NEXT: IR %end1 = ptrtoint ptr %end to i64
+; CHECK-NEXT: IR %0 = add i64 %end1, 1
+; CHECK-NEXT: IR %1 = sub i64 %0, %start2
+; CHECK-NEXT: EMIT vp<%min.iters.check> = icmp ult ir<%1>, ir<4>
+; CHECK-NEXT: EMIT branch-on-cond vp<%min.iters.check>
+; CHECK-NEXT: Successor(s): ir-bb<scalar.ph>, vector.ph
+; CHECK-EMPTY:
+; CHECK-NEXT: vector.ph:
+; CHECK-NEXT: EMIT vp<%n.mod.vf> = urem ir<%1>, ir<4>
+; CHECK-NEXT: EMIT vp<%n.vec> = sub ir<%1>, vp<%n.mod.vf>
+; CHECK-NEXT: vp<%3> = DERIVED-IV ir<%start> + vp<%n.vec> * ir<1>
+; CHECK-NEXT: Successor(s): vector.body
+; CHECK-EMPTY:
+; CHECK-NEXT: vector.body:
+; CHECK-NEXT: EMIT-SCALAR vp<%index> = phi [ ir<0>, vector.ph ], [ vp<%index.next>, vector.body ]
+; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi ir<0>, ir<%red.next>
+; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%index>
+; CHECK-NEXT: WIDEN ir<%l> = load vp<%next.gep>
+; CHECK-NEXT: WIDEN-CAST ir<%l.ext> = sext ir<%l> to i32
+; CHECK-NEXT: WIDEN ir<%mul> = mul ir<%l.ext>, ir<128>
+; CHECK-NEXT: REDUCE ir<%red.next> = ir<%red> + reduce.add (ir<%mul>)
+; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%index>, ir<4>
+; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%n.vec>
+; CHECK-NEXT: Successor(s): middle.block, vector.body
+; CHECK-EMPTY:
+; CHECK-NEXT: middle.block:
+; CHECK-NEXT: EMIT vp<%5> = compute-reduction-result ir<%red>, ir<%red.next>
+; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq ir<%1>, vp<%n.vec>
+; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n>


I think you can also drop checking the final VPlan, the replacement of the VPExpressionRecipe should be tested by the existing tests checking the generated IR, same for the other tests

This will make life a bit easier when the tests need updating

fhahn · 2025-10-28T15:06:39Z

llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll

+ %red = phi i32 [ 0, %entry ], [ %red.next, %loop ]
+ %l = load i8, ptr %ptr.iv, align 1
+ %l.ext = zext i8 %l to i32
+ %mul = mul i32 %l.ext, 63


for completeness, could you also add a test where the constant is on the left side? I think we won't from partial reductions for those yet (and nothing to bundle), but would still be good to have the test case here to check this works as expected

fhahn · 2025-10-28T16:42:42Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+ // The truncate ensures that the type of each extended operand is the
+ // same, and it's been proven that the constant can be extended from
+ // NarrowTy safely. Necessary since ExtA's extended operand would be
+ // e.g. an i8, while the const will likely be an i32. This will be
+ // elided by later optimisations.


It would be more direct to what we generate down the line, but the current version looks fine after the recent cleanups, would still be interesting to see if it would help to simplify things down the line.

fhahn · 2025-10-28T16:43:49Z

llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll

+; CHECK-NEXT: Live-in vp<%0> = VF
+; CHECK-NEXT: Live-in vp<%1> = VF * UF
+; CHECK-NEXT: Live-in vp<%2> = vector-trip-count


would be good to use variables here, some for other vp<> values, to make the test easier to maintain down the line

…(#162503) A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations. This is a stacked PR, 1a and 1b can be merged in any order: 1a. llvm/llvm-project#147302 1b. llvm/llvm-project#163175 2. -> llvm/llvm-project#162503

llvm-ci · 2025-10-28T17:15:30Z

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-5 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/29893

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure) ******************** TEST 'LLVM :: ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll | /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # executed command: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0.	Program arguments: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # | #0 0x0000000103450b74 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f30b74) # | #1 0x000000010344e924 llvm::sys::RunSignalHandlers() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f2e924) # | #2 0x0000000103451674 SignalHandler(int, __siginfo*, void*) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f31674) # | #3 0x00000001860c3584 (/usr/lib/system/libsystem_platform.dylib+0x18047b584) # | #4 0x0000010102fa54e8 # | #5 0x0000000102faf244 llvm::orc::ExecutionSession::removeJITDylibs(std::__1::vector<llvm::IntrusiveRefCntPtr<llvm::orc::JITDylib>, std::__1::allocator<llvm::IntrusiveRefCntPtr<llvm::orc::JITDylib>>>) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100a8f244) # | #6 0x0000000102faeff4 llvm::orc::ExecutionSession::endSession() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100a8eff4) # | #7 0x000000010303bc6c llvm::orc::LLJIT::~LLJIT() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100b1bc6c) # | #8 0x00000001030405f8 llvm::orc::LLLazyJIT::~LLLazyJIT() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100b205f8) # | #9 0x00000001025289c4 runOrcJIT(char const*) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x1000089c4) # | #10 0x0000000102523eb0 main (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100003eb0) # | #11 0x0000000185d07154 # `----------------------------- # error: command failed with exit status: -11 # executed command: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # .---command stderr------------ # | FileCheck error: '<stdin>' is empty. # | FileCheck command line: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # `----------------------------- # error: command failed with exit status: 2 -- ********************

SamTebbs33 · 2025-10-29T09:46:34Z

Thanks for the review @fhahn , sorry I didn't address your review before the auto-squash did it's thing. I'll submit a follow-up to address your comments.

This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. llvm#147026 2. llvm#147255 3. llvm#156976 4. llvm#160154 5. -> llvm#147302 6. llvm#162503 7. llvm#147513

SamTebbs33 requested review from MacDue, david-arm, fhahn, gbossu, huntergr-arm and sdesmalen-arm October 8, 2025 16:06

llvmbot added vectorizers llvm:transforms labels Oct 8, 2025

This was referenced Oct 8, 2025

[LV] Bundle partial reductions inside VPExpressionRecipe #147302

Merged

[LV] Use VPReductionRecipe for partial reductions #147513

Open

huntergr-arm reviewed Oct 17, 2025

View reviewed changes

sdesmalen-arm reviewed Oct 20, 2025

View reviewed changes

sdesmalen-arm reviewed Oct 23, 2025

View reviewed changes

fhahn reviewed Oct 23, 2025

View reviewed changes

SamTebbs33 changed the base branch from users/SamTebbs33/expression-recipe-pred to main October 23, 2025 13:36

SamTebbs33 force-pushed the bundle-constant-mul branch from 6fad720 to 0804843 Compare October 23, 2025 13:39

sdesmalen-arm reviewed Oct 23, 2025

View reviewed changes

sdesmalen-arm previously approved these changes Oct 24, 2025

View reviewed changes

sdesmalen-arm approved these changes Oct 28, 2025

View reviewed changes

SamTebbs33 added 5 commits October 28, 2025 16:24

Use assertion in isMulAccValidAndClampRange

3fdcbc0

Address review

78f4f68

Extend the constant operand for ext(mul(a, b)) as well

d562d2f

Revert "Use assertion in isMulAccValidAndClampRange"

5ce55fc

This reverts commit 0920112.

SamTebbs33 added 4 commits October 28, 2025 16:24

Rebase and address review

1256029

Lambda early return, remove unused test IR and improve comment format…

3a9911a

…ting

Improve ext(mul(ext, ext)) comment

60bac34

Address final review

5b9b6e8

SamTebbs33 force-pushed the bundle-constant-mul branch from 2a40497 to 5b9b6e8 Compare October 28, 2025 16:25

SamTebbs33 enabled auto-merge (squash) October 28, 2025 16:25

fhahn reviewed Oct 28, 2025

View reviewed changes

SamTebbs33 merged commit 22f860a into llvm:main Oct 28, 2025
9 of 10 checks passed

SamTebbs33 added a commit to SamTebbs33/llvm-project that referenced this pull request Oct 29, 2025

[NFCI] Address post-merge review of llvm#162503

370fb5a

		// Convert reduce.add(ext(mul(ext, const))) to reduce.add(ext(mul(ext,
		// ext(const))))

		// All extend recipes must have same opcode or A == B
		// which can be transformed to reduce.add(zext(mul(sext(A), sext(B)))).

	// value and the outer extend opcode doesn't matter
	// value and the outer extend can be folded away by doing wider extends for the operands of the mul.

Uh oh!

[LV] Bundle (partial) reductions with a mul of a constant #162503

[LV] Bundle (partial) reductions with a mul of a constant #162503

Conversation

SamTebbs33 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llvmbot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

huntergr-arm left a comment

Choose a reason for hiding this comment

SamTebbs33 commented Oct 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdesmalen-arm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

llvm-ci commented Oct 28, 2025

SamTebbs33 commented Oct 29, 2025

Labels

6 participants

SamTebbs33 commented Oct 8, 2025 •

edited

Loading

llvmbot commented Oct 8, 2025 •

edited

Loading