Skip to content

Conversation

@SamTebbs33
Copy link
Collaborator

@SamTebbs33 SamTebbs33 commented Oct 8, 2025

A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from reduce.add(mul(ext, const)) to reduce.add(mul(ext, ext(const))) as long as it is safe to extend the constant.

This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations.

This is a stacked PR:

  1. [LV] Bundle partial reductions inside VPExpressionRecipe #147302
  2. -> [LV] Bundle (partial) reductions with a mul of a constant #162503
@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Sam Tebbs (SamTebbs33)

Changes

A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from reduce.add(mul(ext, const)) to reduce.add(mul(ext, ext(const))) as long as it is safe to extend the constant.

This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations.


Full diff: https://github.com/llvm/llvm-project/pull/162503.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+26)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+266)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp index 24426c1a53835..4bf2eb3765080 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp +++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp @@ -3597,6 +3597,32 @@ tryToMatchAndCreateMulAccumulateReduction(VPReductionRecipe *Red, dyn_cast_if_present<VPWidenCastRecipe>(B->getDefiningRecipe()); auto *Mul = cast<VPWidenRecipe>(VecOp->getDefiningRecipe()); + // Match reduce.add(mul(ext, const)) and convert it to + // reduce.add(mul(ext, ext(const))) + if (RecipeA && !RecipeB && B->isLiveIn()) { + Type *NarrowTy = Ctx.Types.inferScalarType(RecipeA->getOperand(0)); + Instruction::CastOps ExtOpc = RecipeA->getOpcode(); + auto *Const = dyn_cast<ConstantInt>(B->getLiveInIRValue()); + if (Const && + llvm::canConstantBeExtended( + Const, NarrowTy, TTI::getPartialReductionExtendKind(ExtOpc))) { + // The truncate ensures that the type of each extended operand is the + // same, and it's been proven that the constant can be extended from + // NarrowTy safely. Necessary since RecipeA's extended operand would be + // e.g. an i8, while the const will likely be an i32. This will be + // elided by later optimisations. + auto *Trunc = + new VPWidenCastRecipe(Instruction::CastOps::Trunc, B, NarrowTy); + Trunc->insertBefore(*RecipeA->getParent(), + std::next(RecipeA->getIterator())); + + Type *WideTy = Ctx.Types.inferScalarType(RecipeA); + RecipeB = new VPWidenCastRecipe(ExtOpc, Trunc, WideTy); + RecipeB->insertAfter(Trunc); + Mul->setOperand(1, RecipeB); + } + } + // Match reduce.add/sub(mul(ext, ext)). if (RecipeA && RecipeB && match(RecipeA, m_ZExtOrSExt(m_VPValue())) && match(RecipeB, m_ZExtOrSExt(m_VPValue())) && diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll index 06b044872c217..ddae9007b7620 100644 --- a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll +++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll @@ -800,3 +800,269 @@ exit: %r.0.lcssa = phi i64 [ %rdx.next, %loop ] ret i64 %r.0.lcssa } + +define i32 @print_mulacc_extended_const(ptr %start, ptr %end) { +; CHECK-LABEL: 'print_mulacc_extended_const' +; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' { +; CHECK-NEXT: Live-in vp<%0> = VF +; CHECK-NEXT: Live-in vp<%1> = VF * UF +; CHECK-NEXT: Live-in vp<%2> = vector-trip-count +; CHECK-NEXT: vp<%3> = original trip-count +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<entry>: +; CHECK-NEXT: EMIT vp<%3> = EXPAND SCEV (1 + (-1 * (ptrtoint ptr %start to i64)) + (ptrtoint ptr %end to i64)) +; CHECK-NEXT: Successor(s): scalar.ph, vector.ph +; CHECK-EMPTY: +; CHECK-NEXT: vector.ph: +; CHECK-NEXT: vp<%4> = DERIVED-IV ir<%start> + vp<%2> * ir<1> +; CHECK-NEXT: EMIT vp<%5> = reduction-start-vector ir<0>, ir<0>, ir<1> +; CHECK-NEXT: Successor(s): vector loop +; CHECK-EMPTY: +; CHECK-NEXT: <x1> vector loop: { +; CHECK-NEXT: vector.body: +; CHECK-NEXT: EMIT vp<%6> = CANONICAL-INDUCTION ir<0>, vp<%index.next> +; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi vp<%5>, vp<%9> +; CHECK-NEXT: vp<%7> = SCALAR-STEPS vp<%6>, ir<1>, vp<%0> +; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%7> +; CHECK-NEXT: vp<%8> = vector-pointer vp<%next.gep> +; CHECK-NEXT: WIDEN ir<%l> = load vp<%8> +; CHECK-NEXT: EXPRESSION vp<%9> = ir<%red> + reduce.add (mul (ir<%l> zext to i32), (ir<63> zext to i32)) +; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%6>, vp<%1> +; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%2> +; CHECK-NEXT: No successors +; CHECK-NEXT: } +; CHECK-NEXT: Successor(s): middle.block +; CHECK-EMPTY: +; CHECK-NEXT: middle.block: +; CHECK-NEXT: EMIT vp<%11> = compute-reduction-result ir<%red>, vp<%9> +; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq vp<%3>, vp<%2> +; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n> +; CHECK-NEXT: Successor(s): ir-bb<exit>, scalar.ph +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<exit>: +; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%11> from middle.block) +; CHECK-NEXT: No successors +; CHECK-EMPTY: +; CHECK-NEXT: scalar.ph: +; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%4>, middle.block ], [ ir<%start>, ir-bb<entry> ] +; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%11>, middle.block ], [ ir<0>, ir-bb<entry> ] +; CHECK-NEXT: Successor(s): ir-bb<loop> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<loop>: +; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from scalar.ph) +; CHECK-NEXT: IR %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from scalar.ph) +; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1 +; CHECK-NEXT: IR %l.ext = zext i8 %l to i32 +; CHECK-NEXT: IR %mul = mul i32 %l.ext, 63 +; CHECK-NEXT: IR %red.next = add i32 %red, %mul +; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 +; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end +; CHECK-NEXT: No successors +; CHECK-NEXT: } +; CHECK: VPlan 'Final VPlan for VF={4},UF={1}' { +; CHECK-NEXT: Live-in ir<%1> = original trip-count +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<entry>: +; CHECK-NEXT: IR %start2 = ptrtoint ptr %start to i64 +; CHECK-NEXT: IR %end1 = ptrtoint ptr %end to i64 +; CHECK-NEXT: IR %0 = add i64 %end1, 1 +; CHECK-NEXT: IR %1 = sub i64 %0, %start2 +; CHECK-NEXT: EMIT vp<%min.iters.check> = icmp ult ir<%1>, ir<4> +; CHECK-NEXT: EMIT branch-on-cond vp<%min.iters.check> +; CHECK-NEXT: Successor(s): ir-bb<scalar.ph>, vector.ph +; CHECK-EMPTY: +; CHECK-NEXT: vector.ph: +; CHECK-NEXT: EMIT vp<%n.mod.vf> = urem ir<%1>, ir<4> +; CHECK-NEXT: EMIT vp<%n.vec> = sub ir<%1>, vp<%n.mod.vf> +; CHECK-NEXT: vp<%3> = DERIVED-IV ir<%start> + vp<%n.vec> * ir<1> +; CHECK-NEXT: Successor(s): vector.body +; CHECK-EMPTY: +; CHECK-NEXT: vector.body: +; CHECK-NEXT: EMIT-SCALAR vp<%index> = phi [ ir<0>, vector.ph ], [ vp<%index.next>, vector.body ] +; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi ir<0>, ir<%red.next> +; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%index> +; CHECK-NEXT: WIDEN ir<%l> = load vp<%next.gep> +; CHECK-NEXT: WIDEN-CAST ir<%l.ext> = zext ir<%l> to i32 +; CHECK-NEXT: WIDEN ir<%mul> = mul ir<%l.ext>, ir<63> +; CHECK-NEXT: REDUCE ir<%red.next> = ir<%red> + reduce.add (ir<%mul>) +; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%index>, ir<4> +; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%n.vec> +; CHECK-NEXT: Successor(s): middle.block, vector.body +; CHECK-EMPTY: +; CHECK-NEXT: middle.block: +; CHECK-NEXT: EMIT vp<%5> = compute-reduction-result ir<%red>, ir<%red.next> +; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq ir<%1>, vp<%n.vec> +; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n> +; CHECK-NEXT: Successor(s): ir-bb<exit>, ir-bb<scalar.ph> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<exit>: +; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%5> from middle.block) +; CHECK-NEXT: No successors +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<scalar.ph>: +; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%3>, middle.block ], [ ir<%start>, ir-bb<entry> ] +; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%5>, middle.block ], [ ir<0>, ir-bb<entry> ] +; CHECK-NEXT: Successor(s): ir-bb<loop> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<loop>: +; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %scalar.ph ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from ir-bb<scalar.ph>) +; CHECK-NEXT: IR %red = phi i32 [ 0, %scalar.ph ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from ir-bb<scalar.ph>) +; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1 +; CHECK-NEXT: IR %l.ext = zext i8 %l to i32 +; CHECK-NEXT: IR %mul = mul i32 %l.ext, 63 +; CHECK-NEXT: IR %red.next = add i32 %red, %mul +; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 +; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end +; CHECK-NEXT: No successors +; CHECK-NEXT: } +entry: + br label %loop + +loop: + %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] + %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] + %l = load i8, ptr %ptr.iv, align 1 + %l.ext = zext i8 %l to i32 + %mul = mul i32 %l.ext, 63 + %red.next = add i32 %red, %mul + %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 + %ec = icmp eq ptr %ptr.iv, %end + br i1 %ec, label %exit, label %loop + +exit: + ret i32 %red.next +} + +; Constants >= 128 cannot be treated as sign-extended, so the expression shouldn't extend 128 +define i32 @print_mulacc_not_extended_const(ptr %start, ptr %end) { +; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' { +; CHECK-NEXT: Live-in vp<%0> = VF +; CHECK-NEXT: Live-in vp<%1> = VF * UF +; CHECK-NEXT: Live-in vp<%2> = vector-trip-count +; CHECK-NEXT: vp<%3> = original trip-count +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<entry>: +; CHECK-NEXT: EMIT vp<%3> = EXPAND SCEV (1 + (-1 * (ptrtoint ptr %start to i64)) + (ptrtoint ptr %end to i64)) +; CHECK-NEXT: Successor(s): scalar.ph, vector.ph +; CHECK-EMPTY: +; CHECK-NEXT: vector.ph: +; CHECK-NEXT: vp<%4> = DERIVED-IV ir<%start> + vp<%2> * ir<1> +; CHECK-NEXT: EMIT vp<%5> = reduction-start-vector ir<0>, ir<0>, ir<1> +; CHECK-NEXT: Successor(s): vector loop +; CHECK-EMPTY: +; CHECK-NEXT: <x1> vector loop: { +; CHECK-NEXT: vector.body: +; CHECK-NEXT: EMIT vp<%6> = CANONICAL-INDUCTION ir<0>, vp<%index.next> +; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi vp<%5>, vp<%9> +; CHECK-NEXT: vp<%7> = SCALAR-STEPS vp<%6>, ir<1>, vp<%0> +; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%7> +; CHECK-NEXT: vp<%8> = vector-pointer vp<%next.gep> +; CHECK-NEXT: WIDEN ir<%l> = load vp<%8> +; CHECK-NEXT: WIDEN-CAST ir<%l.ext> = sext ir<%l> to i32 +; CHECK-NEXT: EXPRESSION vp<%9> = ir<%red> + reduce.add (mul ir<%l.ext>, ir<128>) +; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%6>, vp<%1> +; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%2> +; CHECK-NEXT: No successors +; CHECK-NEXT: } +; CHECK-NEXT: Successor(s): middle.block +; CHECK-EMPTY: +; CHECK-NEXT: middle.block: +; CHECK-NEXT: EMIT vp<%11> = compute-reduction-result ir<%red>, vp<%9> +; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq vp<%3>, vp<%2> +; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n> +; CHECK-NEXT: Successor(s): ir-bb<exit>, scalar.ph +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<exit>: +; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%11> from middle.block) +; CHECK-NEXT: No successors +; CHECK-EMPTY: +; CHECK-NEXT: scalar.ph: +; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%4>, middle.block ], [ ir<%start>, ir-bb<entry> ] +; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%11>, middle.block ], [ ir<0>, ir-bb<entry> ] +; CHECK-NEXT: Successor(s): ir-bb<loop> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<loop>: +; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from scalar.ph) +; CHECK-NEXT: IR %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from scalar.ph) +; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1 +; CHECK-NEXT: IR %l.ext = sext i8 %l to i32 +; CHECK-NEXT: IR %mul = mul i32 %l.ext, 128 +; CHECK-NEXT: IR %red.next = add i32 %red, %mul +; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 +; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end +; CHECK-NEXT: No successors +; CHECK-NEXT: } +; CHECK: VPlan 'Final VPlan for VF={4},UF={1}' { +; CHECK-NEXT: Live-in ir<%1> = original trip-count +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<entry>: +; CHECK-NEXT: IR %start2 = ptrtoint ptr %start to i64 +; CHECK-NEXT: IR %end1 = ptrtoint ptr %end to i64 +; CHECK-NEXT: IR %0 = add i64 %end1, 1 +; CHECK-NEXT: IR %1 = sub i64 %0, %start2 +; CHECK-NEXT: EMIT vp<%min.iters.check> = icmp ult ir<%1>, ir<4> +; CHECK-NEXT: EMIT branch-on-cond vp<%min.iters.check> +; CHECK-NEXT: Successor(s): ir-bb<scalar.ph>, vector.ph +; CHECK-EMPTY: +; CHECK-NEXT: vector.ph: +; CHECK-NEXT: EMIT vp<%n.mod.vf> = urem ir<%1>, ir<4> +; CHECK-NEXT: EMIT vp<%n.vec> = sub ir<%1>, vp<%n.mod.vf> +; CHECK-NEXT: vp<%3> = DERIVED-IV ir<%start> + vp<%n.vec> * ir<1> +; CHECK-NEXT: Successor(s): vector.body +; CHECK-EMPTY: +; CHECK-NEXT: vector.body: +; CHECK-NEXT: EMIT-SCALAR vp<%index> = phi [ ir<0>, vector.ph ], [ vp<%index.next>, vector.body ] +; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi ir<0>, ir<%red.next> +; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%index> +; CHECK-NEXT: WIDEN ir<%l> = load vp<%next.gep> +; CHECK-NEXT: WIDEN-CAST ir<%l.ext> = sext ir<%l> to i32 +; CHECK-NEXT: WIDEN ir<%mul> = mul ir<%l.ext>, ir<128> +; CHECK-NEXT: REDUCE ir<%red.next> = ir<%red> + reduce.add (ir<%mul>) +; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%index>, ir<4> +; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%n.vec> +; CHECK-NEXT: Successor(s): middle.block, vector.body +; CHECK-EMPTY: +; CHECK-NEXT: middle.block: +; CHECK-NEXT: EMIT vp<%5> = compute-reduction-result ir<%red>, ir<%red.next> +; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq ir<%1>, vp<%n.vec> +; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n> +; CHECK-NEXT: Successor(s): ir-bb<exit>, ir-bb<scalar.ph> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<exit>: +; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%5> from middle.block) +; CHECK-NEXT: No successors +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<scalar.ph>: +; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%3>, middle.block ], [ ir<%start>, ir-bb<entry> ] +; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%5>, middle.block ], [ ir<0>, ir-bb<entry> ] +; CHECK-NEXT: Successor(s): ir-bb<loop> +; CHECK-EMPTY: +; CHECK-NEXT: ir-bb<loop>: +; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %scalar.ph ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from ir-bb<scalar.ph>) +; CHECK-NEXT: IR %red = phi i32 [ 0, %scalar.ph ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from ir-bb<scalar.ph>) +; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1 +; CHECK-NEXT: IR %l.ext = sext i8 %l to i32 +; CHECK-NEXT: IR %mul = mul i32 %l.ext, 128 +; CHECK-NEXT: IR %red.next = add i32 %red, %mul +; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 +; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end +; CHECK-NEXT: No successors +; CHECK-NEXT: } +entry: + br label %loop + +loop: + %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] + %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] + %l = load i8, ptr %ptr.iv, align 1 + %l.ext = sext i8 %l to i32 + %mul = mul i32 %l.ext, 128 + %red.next = add i32 %red, %mul + %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 + %ec = icmp eq ptr %ptr.iv, %end + br i1 %ec, label %exit, label %loop + +exit: + %red.next.lcssa = phi i32 [ %red.next, %loop ] + ret i32 %red.next.lcssa +} 
Copy link
Collaborator

@huntergr-arm huntergr-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed it, but I don't see anything in the cost model work related to the trunc/extend feature, just the addition of asserts.

Could you please make that an independent PR?

@SamTebbs33
Copy link
Collaborator Author

I may have missed it, but I don't see anything in the cost model work related to the trunc/extend feature, just the addition of asserts.

Could you please make that an independent PR?

In #147302 I was asked to make IsMulAccValidAndClampRange assert that the partial reduction cost is <= the base cost of the add + mul + extends, but had to change that to a return since the add(mul(ext, const)) case was failing the assertion since it follows the same code path. This PR adds support for that add(mul(ext, const)) case so it makes sense to return to the assertion in this PR. The PR is still small as it is anyway, and I was asked to re-add the assertion in this PR (#147302 (comment)).

Opcode, SrcTy, nullptr, RedTy, VF, ExtKind,
llvm::TargetTransformInfo::PR_None, std::nullopt,
Ctx.CostKind);
assert(PartialReductionCost <= BaseCost &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should only assert that PartialReductionCost.isValid(), because no code ensures that that cost will be lower in practice to the BaseCost.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Match reduce.add(mul(ext, const)) and convert it to
// reduce.add(mul(ext, ext(const)))
if (RecipeA && !RecipeB && B->isLiveIn()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this could also work for the case handled on loop 3657 (zext(mul(zext(a), zext(b))) where b is a constant)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the idea.

Opcode, SrcTy, nullptr, RedTy, VF, ExtKind,
llvm::TargetTransformInfo::PR_None, std::nullopt,
Ctx.CostKind);
assert(PartialReductionCost.isValid() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little bit to my surprise, it seems none of the tests change by doing this.
My suggestion would be to keep the code exactly the same as it was, but just add the assert that the partial reduction cost is valid. And possibly to do that in a separate NFCI PR from this one, because they're functionally unrelated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I'll raise the new PR soon 👍 I had the assertion in this PR since it had to be removed from the other PR because of the "multiply by constant" case, which this PR introduces support for.


return MulAccCost.isValid() &&
MulAccCost < ExtCost + MulCost + RedCost;
if (IsPartialReduction) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my comment above, this is now a big change and none of the tests change. My suggestion is to keep the code the same as it was, but to create a new PR to add the asserts that partial reduction cost is valid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto ExtendAndReplaceConstantOp = [&Ctx](VPWidenCastRecipe *ExtA,
VPWidenCastRecipe *&ExtB,
VPValue *&ValB, VPWidenRecipe *Mul) {
if (ExtA && !ExtB && ValB->isLiveIn()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe bail out early here and for the if (Const && llvm::canConstantBeExtended(..)) case, rather than having a multi-nested if-statement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you bail out early on line 3654 as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, missed the here in your comment. Done!

VPValue *&ValB, VPWidenRecipe *Mul) {
if (ExtA && !ExtB && ValB->isLiveIn()) {
Type *NarrowTy = Ctx.Types.inferScalarType(ExtA->getOperand(0));
Type *WideTy = Ctx.Types.inferScalarType(ExtA);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move closer to use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 3621 to 3630
auto *Trunc =
new VPWidenCastRecipe(Instruction::CastOps::Trunc, ValB, NarrowTy);
Trunc->insertBefore(*ExtA->getParent(), std::next(ExtA->getIterator()));

VPWidenCastRecipe *NewCast =
new VPWidenCastRecipe(ExtOpc, Trunc, WideTy);
NewCast->insertAfter(Trunc);
ExtB = NewCast;
ValB = NewCast;
Mul->setOperand(1, NewCast);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insertion point can be simplified to be the Mul, because that's the point where we care about the extended input. You can also use VPBuilder, to avoid having to create the recipe and then insert it, i.e.

Suggested change
auto *Trunc =
new VPWidenCastRecipe(Instruction::CastOps::Trunc, ValB, NarrowTy);
Trunc->insertBefore(*ExtA->getParent(), std::next(ExtA->getIterator()));
VPWidenCastRecipe *NewCast =
new VPWidenCastRecipe(ExtOpc, Trunc, WideTy);
NewCast->insertAfter(Trunc);
ExtB = NewCast;
ValB = NewCast;
Mul->setOperand(1, NewCast);
VPBuilder Builder(Mul);
auto *Trunc =
Builder.createWidenCast(Instruction::CastOps::Trunc, ValB, NarrowTy);
Type *WideTy = Ctx.Types.inferScalarType(ExtA);
ValB = ExtB = Builder.createWidenCast(ExtOpc, Trunc, WideTy);
Mul->setOperand(1, ExtB);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

SamTebbs33 added a commit that referenced this pull request Oct 23, 2025
This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. #147026 2. #147255 3. #156976 4. #160154 5. -> #147302 6. #162503 7. #147513
Comment on lines 3616 to 3620
// The truncate ensures that the type of each extended operand is the
// same, and it's been proven that the constant can be extended from
// NarrowTy safely. Necessary since ExtA's extended operand would be
// e.g. an i8, while the const will likely be an i32. This will be
// elided by later optimisations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid introcuding the explicit cast recipe by just creating a new extended live-in? And then use pattern matching to get the extend from either side of the multiply if needed?

There currently are quite a bit of changes in the diff that make it a bit difficult to see how that directly relates to supporting constants, but that may help to simplify things

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid introcuding the explicit cast recipe by just creating a new extended live-in?

Is there any particular downside to creating the explicit cast recipe? (I'm asking in case handling this in pattern matching and VPExpression recipes would be tricky for some reason)

There currently are quite a bit of changes in the diff that make it a bit difficult to see how that directly relates to supporting constants

FWIW, a lot of the changes in this PR currently are unrelated to supporting constants, so I've asked those to be removed from this PR (#162503 (comment) and #162503 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more direct to what we generate down the line, but the current version looks fine after the recent cleanups, would still be interesting to see if it would help to simplify things down the line.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 23, 2025
@SamTebbs33 SamTebbs33 changed the base branch from users/SamTebbs33/expression-recipe-pred to main October 23, 2025 13:36
Comment on lines 3712 to 3713
// Convert reduce.add(ext(mul(ext, const))) to reduce.add(ext(mul(ext,
// ext(const))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unfortunate formatting

Suggested change
// Convert reduce.add(ext(mul(ext, const))) to reduce.add(ext(mul(ext,
// ext(const))))
// reduce.add(ext(mul(ext, const)))
// -> reduce.add(ext(mul(ext, ext(trunc(const)))))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto ExtendAndReplaceConstantOp = [&Ctx](VPWidenCastRecipe *ExtA,
VPWidenCastRecipe *&ExtB,
VPValue *&ValB, VPWidenRecipe *Mul) {
if (ExtA && !ExtB && ValB->isLiveIn()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you bail out early on line 3654 as well?

Comment on lines 2874 to 2876
%gep.b = getelementptr i8, ptr %b, i64 %iv
%load.b = load i8, ptr %gep.b, align 1
%ext.b = zext i8 %load.b to i16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are unused (and so is argument ptr %b)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sdesmalen-arm
sdesmalen-arm previously approved these changes Oct 24, 2025
Comment on lines 3717 to 3718
// All extend recipes must have same opcode or A == B
// which can be transformed to reduce.add(zext(mul(sext(A), sext(B)))).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've just copied this from above, but this comment is not accurate.

The cases it tries to handle are:

reduce.add(zext(mul(zext(a), zext(b)))) -> reduce.add(mul(wider_zext(a), wider_zext(b))) reduce.add(sext(mul(sext(a), sext(b)))) -> reduce.add(mul(wider_sext(a), wider_sext(b))) 

and the other case (and reason for checking Ext0 == Ext1) is because that would mean the mul is non-negative which means that the final zero-extend can be folded away, i.e.

reduce.add(zext(mul(sext(a), sext(a)))) // result of mul is nneg -> reduce.add(mul(wider_sext(a), wider_sext(a))) 
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you. I've kept the comment quite generic since it doesn't check exactly which opcodes are sext and zext, but I hope the comment is clearer now.

@sdesmalen-arm sdesmalen-arm dismissed their stale review October 24, 2025 13:46

(I didn't mean to accept the PR yet)

dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. llvm#147026 2. llvm#147255 3. llvm#156976 4. llvm#160154 5. -> llvm#147302 6. llvm#162503 7. llvm#147513
Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nit addressed. This approach seems functionally sound and I believe that any suggestions for a different approach should be able to be handled in a follow-up. It would just be nice to get this improvement in.

// -> reduce.add(mul(wider_ext(A), wider_ext(B)))
// The inner extends must either have the same opcode as the outer extend or
// be the same, in which case the multiply can never result in a negative
// value and the outer extend opcode doesn't matter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// value and the outer extend opcode doesn't matter
// value and the outer extend can be folded away by doing wider extends for the operands of the mul.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you!

A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations.
@SamTebbs33 SamTebbs33 enabled auto-merge (squash) October 28, 2025 16:25
Comment on lines +976 to +995
; CHECK-NEXT: ir-bb<exit>:
; CHECK-NEXT: IR %red.next.lcssa = phi i32 [ %red.next, %loop ] (extra operand: vp<%11> from middle.block)
; CHECK-NEXT: No successors
; CHECK-EMPTY:
; CHECK-NEXT: scalar.ph:
; CHECK-NEXT: EMIT-SCALAR vp<%bc.resume.val> = phi [ vp<%4>, middle.block ], [ ir<%start>, ir-bb<entry> ]
; CHECK-NEXT: EMIT-SCALAR vp<%bc.merge.rdx> = phi [ vp<%11>, middle.block ], [ ir<0>, ir-bb<entry> ]
; CHECK-NEXT: Successor(s): ir-bb<loop>
; CHECK-EMPTY:
; CHECK-NEXT: ir-bb<loop>:
; CHECK-NEXT: IR %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] (extra operand: vp<%bc.resume.val> from scalar.ph)
; CHECK-NEXT: IR %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] (extra operand: vp<%bc.merge.rdx> from scalar.ph)
; CHECK-NEXT: IR %l = load i8, ptr %ptr.iv, align 1
; CHECK-NEXT: IR %l.ext = sext i8 %l to i32
; CHECK-NEXT: IR %mul = mul i32 %l.ext, 128
; CHECK-NEXT: IR %red.next = add i32 %red, %mul
; CHECK-NEXT: IR %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1
; CHECK-NEXT: IR %ec = icmp eq ptr %ptr.iv, %end
; CHECK-NEXT: No successors
; CHECK-NEXT: }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop those, as the main thing is checking forming the reductions

Comment on lines +996 to +1029
; CHECK: VPlan 'Final VPlan for VF={4},UF={1}' {
; CHECK-NEXT: Live-in ir<%1> = original trip-count
; CHECK-EMPTY:
; CHECK-NEXT: ir-bb<entry>:
; CHECK-NEXT: IR %start2 = ptrtoint ptr %start to i64
; CHECK-NEXT: IR %end1 = ptrtoint ptr %end to i64
; CHECK-NEXT: IR %0 = add i64 %end1, 1
; CHECK-NEXT: IR %1 = sub i64 %0, %start2
; CHECK-NEXT: EMIT vp<%min.iters.check> = icmp ult ir<%1>, ir<4>
; CHECK-NEXT: EMIT branch-on-cond vp<%min.iters.check>
; CHECK-NEXT: Successor(s): ir-bb<scalar.ph>, vector.ph
; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:
; CHECK-NEXT: EMIT vp<%n.mod.vf> = urem ir<%1>, ir<4>
; CHECK-NEXT: EMIT vp<%n.vec> = sub ir<%1>, vp<%n.mod.vf>
; CHECK-NEXT: vp<%3> = DERIVED-IV ir<%start> + vp<%n.vec> * ir<1>
; CHECK-NEXT: Successor(s): vector.body
; CHECK-EMPTY:
; CHECK-NEXT: vector.body:
; CHECK-NEXT: EMIT-SCALAR vp<%index> = phi [ ir<0>, vector.ph ], [ vp<%index.next>, vector.body ]
; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi ir<0>, ir<%red.next>
; CHECK-NEXT: EMIT vp<%next.gep> = ptradd ir<%start>, vp<%index>
; CHECK-NEXT: WIDEN ir<%l> = load vp<%next.gep>
; CHECK-NEXT: WIDEN-CAST ir<%l.ext> = sext ir<%l> to i32
; CHECK-NEXT: WIDEN ir<%mul> = mul ir<%l.ext>, ir<128>
; CHECK-NEXT: REDUCE ir<%red.next> = ir<%red> + reduce.add (ir<%mul>)
; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<%index>, ir<4>
; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<%n.vec>
; CHECK-NEXT: Successor(s): middle.block, vector.body
; CHECK-EMPTY:
; CHECK-NEXT: middle.block:
; CHECK-NEXT: EMIT vp<%5> = compute-reduction-result ir<%red>, ir<%red.next>
; CHECK-NEXT: EMIT vp<%cmp.n> = icmp eq ir<%1>, vp<%n.vec>
; CHECK-NEXT: EMIT branch-on-cond vp<%cmp.n>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also drop checking the final VPlan, the replacement of the VPExpressionRecipe should be tested by the existing tests checking the generated IR, same for the other tests

This will make life a bit easier when the tests need updating

%red = phi i32 [ 0, %entry ], [ %red.next, %loop ]
%l = load i8, ptr %ptr.iv, align 1
%l.ext = zext i8 %l to i32
%mul = mul i32 %l.ext, 63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for completeness, could you also add a test where the constant is on the left side? I think we won't from partial reductions for those yet (and nothing to bundle), but would still be good to have the test case here to check this works as expected

Comment on lines 3616 to 3620
// The truncate ensures that the type of each extended operand is the
// same, and it's been proven that the constant can be extended from
// NarrowTy safely. Necessary since ExtA's extended operand would be
// e.g. an i8, while the const will likely be an i32. This will be
// elided by later optimisations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more direct to what we generate down the line, but the current version looks fine after the recent cleanups, would still be interesting to see if it would help to simplify things down the line.

Comment on lines +807 to +809
; CHECK-NEXT: Live-in vp<%0> = VF
; CHECK-NEXT: Live-in vp<%1> = VF * UF
; CHECK-NEXT: Live-in vp<%2> = vector-trip-count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to use variables here, some for other vp<> values, to make the test easier to maintain down the line

@SamTebbs33 SamTebbs33 merged commit 22f860a into llvm:main Oct 28, 2025
9 of 10 checks passed
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 28, 2025
…(#162503) A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations. This is a stacked PR, 1a and 1b can be merged in any order: 1a. llvm/llvm-project#147302 1b. llvm/llvm-project#163175 2. -> llvm/llvm-project#162503
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 28, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-5 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/29893

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure) ******************** TEST 'LLVM :: ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll | /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # executed command: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0.	Program arguments: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # | #0 0x0000000103450b74 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f30b74) # | #1 0x000000010344e924 llvm::sys::RunSignalHandlers() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f2e924) # | #2 0x0000000103451674 SignalHandler(int, __siginfo*, void*) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f31674) # | #3 0x00000001860c3584 (/usr/lib/system/libsystem_platform.dylib+0x18047b584) # | #4 0x0000010102fa54e8 # | #5 0x0000000102faf244 llvm::orc::ExecutionSession::removeJITDylibs(std::__1::vector<llvm::IntrusiveRefCntPtr<llvm::orc::JITDylib>, std::__1::allocator<llvm::IntrusiveRefCntPtr<llvm::orc::JITDylib>>>) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100a8f244) # | #6 0x0000000102faeff4 llvm::orc::ExecutionSession::endSession() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100a8eff4) # | #7 0x000000010303bc6c llvm::orc::LLJIT::~LLJIT() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100b1bc6c) # | #8 0x00000001030405f8 llvm::orc::LLLazyJIT::~LLLazyJIT() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100b205f8) # | #9 0x00000001025289c4 runOrcJIT(char const*) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x1000089c4) # | #10 0x0000000102523eb0 main (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100003eb0) # | #11 0x0000000185d07154 # `----------------------------- # error: command failed with exit status: -11 # executed command: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # .---command stderr------------ # | FileCheck error: '<stdin>' is empty. # | FileCheck command line: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll # `----------------------------- # error: command failed with exit status: 2 -- ******************** 
@SamTebbs33
Copy link
Collaborator Author

Thanks for the review @fhahn , sorry I didn't address your review before the auto-squash did it's thing. I'll submit a follow-up to address your comments.

Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. llvm#147026 2. llvm#147255 3. llvm#156976 4. llvm#160154 5. -> llvm#147302 6. llvm#162503 7. llvm#147513
SamTebbs33 added a commit to SamTebbs33/llvm-project that referenced this pull request Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment