[LV] Check all users of partial reductions in chain have same scale. #162822

fhahn · 2025-10-10T10:15:51Z

Check that all partial reductions in a chain are only used by other partial reductions with the same scale factor. Otherwise we end up creating users of scaled reductions where the types of the other operands don't match.

A similar issue was addressed in #158603, but misses the chained cases.

Fixes #162530.

Check that all partial reductions in a chain are only used by other partial reductions with the same scale factor. Otherwise we end up creating users of scaled reductions where the types of the other operands don't match. A similar issue was addressed in llvm#158603, but misses the chained cases. Fixes llvm#162530.

llvmbot · 2025-10-10T10:16:30Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Check that all partial reductions in a chain are only used by other partial reductions with the same scale factor. Otherwise we end up creating users of scaled reductions where the types of the other operands don't match.

A similar issue was addressed in #158603, but misses the chained cases.

Fixes #162530.

Full diff: https://github.com/llvm/llvm-project/pull/162822.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+28-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll (-126)
(added) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll (+112)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index e62d57e6920b7..8856857e06964 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -7910,6 +7910,29 @@ void VPRecipeBuilder::collectScaledReductions(VFRange &Range) { (!Chain.ExtendB || ExtendIsOnlyUsedByPartialReductions(Chain.ExtendB))) ScaledReductionMap.try_emplace(Chain.Reduction, Pair.second); } + + // Check that all partial reductions in a chain are only used by other partial + // reductions with the same scale factor. Otherwise we end up creating users + // of scaled reductions where the types of the other operands don't match. + auto AllUsersPartialRdx = [this](Instruction *I, unsigned Scale) { + return all_of(I->users(), [Scale, this](const User *U) { + auto *UI = cast<Instruction>(U); + + if (isa<PHINode>(UI) && UI->getParent() == OrigLoop->getHeader()) { + return all_of(UI->users(), [Scale, this](const User *U) { + auto *UI = cast<Instruction>(U); + return ScaledReductionMap.lookup_or(UI, 0) == Scale; + }); + } + + return ScaledReductionMap.lookup_or(UI, 0) == Scale || + !OrigLoop->contains(UI->getParent()); + }); + }; + for (const auto &[Chain, Scale] : PartialReductionChains) { + if (!AllUsersPartialRdx(Chain.Reduction, Scale)) + ScaledReductionMap.erase(Chain.Reduction); + } } bool VPRecipeBuilder::getScaledReductions( @@ -8093,11 +8116,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(VPSingleDefRecipe *R, if (isa<LoadInst>(Instr) || isa<StoreInst>(Instr)) return tryToWidenMemory(Instr, Operands, Range); - if (std::optional<unsigned> ScaleFactor = getScalingForReduction(Instr)) { - if (auto PartialRed = - tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value())) - return PartialRed; - } + if (std::optional<unsigned> ScaleFactor = getScalingForReduction(Instr)) + return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value()); if (!shouldWiden(Instr, Range)) return nullptr; @@ -8131,9 +8151,9 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, isa<VPPartialReductionRecipe>(BinOpRecipe)) std::swap(BinOp, Accumulator); - if (ScaleFactor != - vputils::getVFScaleFactor(Accumulator->getDefiningRecipe())) - return nullptr; + assert(ScaleFactor == + vputils::getVFScaleFactor(Accumulator->getDefiningRecipe()) && + "all accumulators in chain must have same scale factor"); unsigned ReductionOpcode = Reduction->getOpcode(); if (ReductionOpcode == Instruction::Sub) { diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll index 5ae08393a1804..3dfa6df3313a5 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll @@ -1361,132 +1361,6 @@ for.body: ; preds = %for.body.preheader, br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !loop !1 } -define i32 @red_extended_add_incomplete_chain(ptr %start, ptr %end, i32 %offset) { -; CHECK-NEON-LABEL: define i32 @red_extended_add_incomplete_chain( -; CHECK-NEON-SAME: ptr [[START:%.*]], ptr [[END:%.*]], i32 [[OFFSET:%.*]]) #[[ATTR1:[0-9]+]] { -; CHECK-NEON-NEXT: entry: -; CHECK-NEON-NEXT: [[START2:%.*]] = ptrtoint ptr [[START]] to i64 -; CHECK-NEON-NEXT: [[END1:%.*]] = ptrtoint ptr [[END]] to i64 -; CHECK-NEON-NEXT: [[TMP0:%.*]] = add i64 [[END1]], 1 -; CHECK-NEON-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[START2]] -; CHECK-NEON-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], 16 -; CHECK-NEON-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] -; CHECK-NEON: vector.ph: -; CHECK-NEON-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP1]], 16 -; CHECK-NEON-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP1]], [[N_MOD_VF]] -; CHECK-NEON-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[START]], i64 [[N_VEC]] -; CHECK-NEON-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[OFFSET]], i64 0 -; CHECK-NEON-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer -; CHECK-NEON-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK-NEON: vector.body: -; CHECK-NEON-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEON-NEXT: [[VEC_PHI:%.*]] = phi <16 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEON-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[INDEX]] -; CHECK-NEON-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[NEXT_GEP]], align 1 -; CHECK-NEON-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i32> -; CHECK-NEON-NEXT: [[PARTIAL_REDUCE:%.*]] = add <16 x i32> [[VEC_PHI]], [[TMP3]] -; CHECK-NEON-NEXT: [[TMP4]] = add <16 x i32> [[PARTIAL_REDUCE]], [[BROADCAST_SPLAT]] -; CHECK-NEON-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16 -; CHECK-NEON-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEON-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]] -; CHECK-NEON: middle.block: -; CHECK-NEON-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP4]]) -; CHECK-NEON-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP1]], [[N_VEC]] -; CHECK-NEON-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]] -; CHECK-NEON: scalar.ph: -; -; CHECK-SVE-LABEL: define i32 @red_extended_add_incomplete_chain( -; CHECK-SVE-SAME: ptr [[START:%.*]], ptr [[END:%.*]], i32 [[OFFSET:%.*]]) #[[ATTR1:[0-9]+]] { -; CHECK-SVE-NEXT: entry: -; CHECK-SVE-NEXT: [[START2:%.*]] = ptrtoint ptr [[START]] to i64 -; CHECK-SVE-NEXT: [[END1:%.*]] = ptrtoint ptr [[END]] to i64 -; CHECK-SVE-NEXT: [[TMP0:%.*]] = add i64 [[END1]], 1 -; CHECK-SVE-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[START2]] -; CHECK-SVE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-SVE-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2 -; CHECK-SVE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], [[TMP3]] -; CHECK-SVE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] -; CHECK-SVE: vector.ph: -; CHECK-SVE-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-SVE-NEXT: [[TMP5:%.*]] = mul nuw i64 [[TMP4]], 4 -; CHECK-SVE-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP1]], [[TMP5]] -; CHECK-SVE-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP1]], [[N_MOD_VF]] -; CHECK-SVE-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[START]], i64 [[N_VEC]] -; CHECK-SVE-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[OFFSET]], i64 0 -; CHECK-SVE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer -; CHECK-SVE-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK-SVE: vector.body: -; CHECK-SVE-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-SVE-NEXT: [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP9:%.*]], [[VECTOR_BODY]] ] -; CHECK-SVE-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[INDEX]] -; CHECK-SVE-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i8>, ptr [[NEXT_GEP]], align 1 -; CHECK-SVE-NEXT: [[TMP7:%.*]] = zext <vscale x 4 x i8> [[WIDE_LOAD]] to <vscale x 4 x i32> -; CHECK-SVE-NEXT: [[TMP8:%.*]] = add <vscale x 4 x i32> [[VEC_PHI]], [[TMP7]] -; CHECK-SVE-NEXT: [[TMP9]] = add <vscale x 4 x i32> [[TMP8]], [[BROADCAST_SPLAT]] -; CHECK-SVE-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]] -; CHECK-SVE-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-SVE-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]] -; CHECK-SVE: middle.block: -; CHECK-SVE-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP9]]) -; CHECK-SVE-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP1]], [[N_VEC]] -; CHECK-SVE-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]] -; CHECK-SVE: scalar.ph: -; -; CHECK-SVE-MAXBW-LABEL: define i32 @red_extended_add_incomplete_chain( -; CHECK-SVE-MAXBW-SAME: ptr [[START:%.*]], ptr [[END:%.*]], i32 [[OFFSET:%.*]]) #[[ATTR1:[0-9]+]] { -; CHECK-SVE-MAXBW-NEXT: entry: -; CHECK-SVE-MAXBW-NEXT: [[START2:%.*]] = ptrtoint ptr [[START]] to i64 -; CHECK-SVE-MAXBW-NEXT: [[END1:%.*]] = ptrtoint ptr [[END]] to i64 -; CHECK-SVE-MAXBW-NEXT: [[TMP0:%.*]] = add i64 [[END1]], 1 -; CHECK-SVE-MAXBW-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[START2]] -; CHECK-SVE-MAXBW-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-SVE-MAXBW-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 3 -; CHECK-SVE-MAXBW-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], [[TMP3]] -; CHECK-SVE-MAXBW-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] -; CHECK-SVE-MAXBW: vector.ph: -; CHECK-SVE-MAXBW-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-SVE-MAXBW-NEXT: [[TMP5:%.*]] = mul nuw i64 [[TMP4]], 8 -; CHECK-SVE-MAXBW-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP1]], [[TMP5]] -; CHECK-SVE-MAXBW-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP1]], [[N_MOD_VF]] -; CHECK-SVE-MAXBW-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[START]], i64 [[N_VEC]] -; CHECK-SVE-MAXBW-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i32> poison, i32 [[OFFSET]], i64 0 -; CHECK-SVE-MAXBW-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer -; CHECK-SVE-MAXBW-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK-SVE-MAXBW: vector.body: -; CHECK-SVE-MAXBW-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-SVE-MAXBW-NEXT: [[VEC_PHI:%.*]] = phi <vscale x 8 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP8:%.*]], [[VECTOR_BODY]] ] -; CHECK-SVE-MAXBW-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[INDEX]] -; CHECK-SVE-MAXBW-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x i8>, ptr [[NEXT_GEP]], align 1 -; CHECK-SVE-MAXBW-NEXT: [[TMP7:%.*]] = zext <vscale x 8 x i8> [[WIDE_LOAD]] to <vscale x 8 x i32> -; CHECK-SVE-MAXBW-NEXT: [[PARTIAL_REDUCE:%.*]] = add <vscale x 8 x i32> [[VEC_PHI]], [[TMP7]] -; CHECK-SVE-MAXBW-NEXT: [[TMP8]] = add <vscale x 8 x i32> [[PARTIAL_REDUCE]], [[BROADCAST_SPLAT]] -; CHECK-SVE-MAXBW-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]] -; CHECK-SVE-MAXBW-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-SVE-MAXBW-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]] -; CHECK-SVE-MAXBW: middle.block: -; CHECK-SVE-MAXBW-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.nxv8i32(<vscale x 8 x i32> [[TMP8]]) -; CHECK-SVE-MAXBW-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP1]], [[N_VEC]] -; CHECK-SVE-MAXBW-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]] -; CHECK-SVE-MAXBW: scalar.ph: -; -entry: - br label %loop - -loop: - %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] - %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] - %l = load i8, ptr %ptr.iv, align 1 - %l.ext = zext i8 %l to i32 - %add = add i32 %red, %l.ext - %red.next = add i32 %add, %offset - %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 - %ec = icmp eq ptr %ptr.iv, %end - br i1 %ec, label %exit, label %loop - -exit: - ret i32 %red.next -} - attributes #0 = { vscale_range(1,16) } diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll new file mode 100644 index 0000000000000..5b5c40a83e11e --- /dev/null +++ b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll @@ -0,0 +1,112 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --filter-out-after "^scalar.ph:" --version 4 +; RUN: opt --mattr=+neon,+dotprod -passes=loop-vectorize -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s | FileCheck %s --check-prefixes=CHECK-NEON + +target triple = "arm64-apple-macosx" + +define i32 @red_extended_add_incomplete_chain(ptr %start, ptr %end, i32 %offset) { +; CHECK-NEON-LABEL: define i32 @red_extended_add_incomplete_chain( +; CHECK-NEON-SAME: ptr [[START:%.*]], ptr [[END:%.*]], i32 [[OFFSET:%.*]]) #[[ATTR1:[0-9]+]] { +; CHECK-NEON-NEXT: entry: +; CHECK-NEON-NEXT: [[START2:%.*]] = ptrtoint ptr [[START]] to i64 +; CHECK-NEON-NEXT: [[END1:%.*]] = ptrtoint ptr [[END]] to i64 +; CHECK-NEON-NEXT: [[TMP0:%.*]] = add i64 [[END1]], 1 +; CHECK-NEON-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[START2]] +; CHECK-NEON-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], 16 +; CHECK-NEON-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK-NEON: vector.ph: +; CHECK-NEON-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP1]], 16 +; CHECK-NEON-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP1]], [[N_MOD_VF]] +; CHECK-NEON-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[START]], i64 [[N_VEC]] +; CHECK-NEON-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[OFFSET]], i64 0 +; CHECK-NEON-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer +; CHECK-NEON-NEXT: br label [[VECTOR_BODY:%.*]] +; CHECK-NEON: vector.body: +; CHECK-NEON-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEON-NEXT: [[VEC_PHI:%.*]] = phi <16 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEON-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[INDEX]] +; CHECK-NEON-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[NEXT_GEP]], align 1 +; CHECK-NEON-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i32> +; CHECK-NEON-NEXT: [[PARTIAL_REDUCE:%.*]] = add <16 x i32> [[VEC_PHI]], [[TMP3]] +; CHECK-NEON-NEXT: [[TMP4]] = add <16 x i32> [[PARTIAL_REDUCE]], [[BROADCAST_SPLAT]] +; CHECK-NEON-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16 +; CHECK-NEON-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEON-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]] +; CHECK-NEON: middle.block: +; CHECK-NEON-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP4]]) +; CHECK-NEON-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP1]], [[N_VEC]] +; CHECK-NEON-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]] +; CHECK-NEON: scalar.ph: +; +entry: + br label %loop + +loop: + %ptr.iv = phi ptr [ %start, %entry ], [ %gep.iv.next, %loop ] + %red = phi i32 [ 0, %entry ], [ %red.next, %loop ] + %l = load i8, ptr %ptr.iv, align 1 + %l.ext = zext i8 %l to i32 + %add = add i32 %red, %l.ext + %red.next = add i32 %add, %offset + %gep.iv.next = getelementptr i8, ptr %ptr.iv, i64 1 + %ec = icmp eq ptr %ptr.iv, %end + br i1 %ec, label %exit, label %loop + +exit: + ret i32 %red.next +} + + +define i16 @test_incomplete_chain_without_mul(ptr noalias %dst, ptr %A, ptr %B) #0 { +; CHECK-NEON-LABEL: define i16 @test_incomplete_chain_without_mul( +; CHECK-NEON-SAME: ptr noalias [[DST:%.*]], ptr [[A:%.*]], ptr [[B:%.*]]) #[[ATTR0:[0-9]+]] { +; CHECK-NEON-NEXT: entry: +; CHECK-NEON-NEXT: br label [[VECTOR_MEMCHECK:%.*]] +; CHECK-NEON: vector.ph: +; CHECK-NEON-NEXT: br label [[VECTOR_BODY:%.*]] +; CHECK-NEON: vector.body: +; CHECK-NEON-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEON-NEXT: [[VEC_PHI:%.*]] = phi <16 x i16> [ zeroinitializer, [[VECTOR_MEMCHECK]] ], [ [[TMP7:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEON-NEXT: [[TMP0:%.*]] = load i8, ptr [[A]], align 1 +; CHECK-NEON-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[TMP0]], i64 0 +; CHECK-NEON-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer +; CHECK-NEON-NEXT: [[TMP1:%.*]] = zext <16 x i8> [[BROADCAST_SPLAT]] to <16 x i16> +; CHECK-NEON-NEXT: [[TMP2:%.*]] = extractelement <16 x i16> [[TMP1]], i32 15 +; CHECK-NEON-NEXT: store i16 [[TMP2]], ptr [[DST]], align 2 +; CHECK-NEON-NEXT: [[TMP3:%.*]] = load i8, ptr [[B]], align 1 +; CHECK-NEON-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <16 x i8> poison, i8 [[TMP3]], i64 0 +; CHECK-NEON-NEXT: [[BROADCAST_SPLAT7:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT6]], <16 x i8> poison, <16 x i32> zeroinitializer +; CHECK-NEON-NEXT: [[TMP4:%.*]] = zext <16 x i8> [[BROADCAST_SPLAT7]] to <16 x i16> +; CHECK-NEON-NEXT: [[TMP5:%.*]] = add <16 x i16> [[VEC_PHI]], [[TMP4]] +; CHECK-NEON-NEXT: [[TMP6:%.*]] = add <16 x i16> [[TMP5]], [[TMP1]] +; CHECK-NEON-NEXT: [[TMP7]] = add <16 x i16> [[TMP6]], [[TMP4]] +; CHECK-NEON-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16 +; CHECK-NEON-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 +; CHECK-NEON-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] +; CHECK-NEON: middle.block: +; CHECK-NEON-NEXT: [[TMP9:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[TMP7]]) +; CHECK-NEON-NEXT: br label [[SCALAR_PH:%.*]] +; CHECK-NEON: scalar.ph: +; +entry: + br label %loop + +loop: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ] + %red = phi i16 [ 0, %entry ], [ %red.next, %loop ] + %l.a = load i8, ptr %A, align 1 + %a.ext = zext i8 %l.a to i16 + store i16 %a.ext, ptr %dst, align 2 + %l.b = load i8, ptr %B, align 1 + %b.ext = zext i8 %l.b to i16 + %add = add i16 %red, %b.ext + %add.1 = add i16 %add, %a.ext + %red.next = add i16 %add.1, %b.ext + %iv.next = add i64 %iv, 1 + %ec = icmp ult i64 %iv, 1024 + br i1 %ec, label %loop, label %exit + +exit: + ret i16 %red.next +} + +attributes #0 = { "target-cpu"="grace" }

artagnon · 2025-10-10T18:02:58Z

llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll

@@ -0,0 +1,112 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --filter-out-after "^scalar.ph:" --version 4


Suggested change

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --filter-out-after "^scalar.ph:" --version 4

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --filter-out-after "^scalar.ph:" --version 6

I think it's just a case of deleting the NOTE line and regenerating the CHECKs.

Ah yes, that was copied over. Should use the latest version now, thanks

david-arm · 2025-10-13T08:51:19Z

llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll

@@ -0,0 +1,112 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --filter-out-after "^scalar.ph:" --version 4


I think it's just a case of deleting the NOTE line and regenerating the CHECKs.

david-arm · 2025-10-13T08:55:51Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+ // Check that all partial reductions in a chain are only used by other partial
+ // reductions with the same scale factor. Otherwise we end up creating users
+ // of scaled reductions where the types of the other operands don't match.
+ auto AllUsersPartialRdx = [this](Instruction *I, unsigned Scale) {


Whilst this does seem to work, would it be better to fix getScaledReductions to avoid creating the chain in the first place? Or does getScaledReductions simply not have enough information to determine that?

It does not have enough information unfortunately, as there are cases that will only get rejected below, .e.g ExtendIsOnlyUsedByPartialReductions

sdesmalen-arm · 2025-10-13T08:59:38Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+ auto AllUsersPartialRdx = [this](Instruction *I, unsigned Scale) {
+ return all_of(I->users(), [Scale, this](const User *U) {


You can avoid nesting a lambda if you'd write: if (!all_of(Chain.Reduction->users(), IsPartialRdx(Scale)).

Thanks, I had to move the lambda definition into the loop so we can capture scale, as to just pass the function to all_of() it needs to take a single User* argument I think

sdesmalen-arm · 2025-10-13T09:03:11Z

llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll

+
+loop:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
+ %red = phi i16 [ 0, %entry ], [ %red.next, %loop ]


What are the two scale factors in this test?

sdesmalen-arm · 2025-10-13T09:11:07Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+ };
+ for (const auto &[Chain, Scale] : PartialReductionChains) {
+ if (!AllUsersPartialRdx(Chain.Reduction, Scale))
+ ScaledReductionMap.erase(Chain.Reduction);


Can this be done in ExtendIsOnlyUsedByPartialReductions, rather than a loop that removes these? (from what I can see, all the information to make this decision is available in PartialReductionChains)

We could use the information from PartialReductionChains, but this would include entries that will get rejected later, by the ExtendIsOnlyUsedByPartialReductions. I kept it as 2 separate loops for now.

sdesmalen-arm · 2025-10-13T09:15:01Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+ }
+
+ return ScaledReductionMap.lookup_or(UI, 0) == Scale ||
+ !OrigLoop->contains(UI->getParent());


|| !OrigLoop->contains(UI->getParent())

Is this part of the condition covered by a test-case?

Yep, this is the cover the exit-user of the reduction chain.

SamTebbs33 · 2025-10-14T09:11:02Z

llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll

+
+target triple = "arm64-apple-macosx"
+
+define i32 @red_extended_add_incomplete_chain(ptr %start, ptr %end, i32 %offset) {


This test wasn't producing a partial reduction before, so I don't know if we're testing that what would produce a partial reduction before no longer does. Could you precommit a test that does produce a partial reduction in an invalid situation, to show that this PR stops that?

The behavior for this function didn't change, it was already not producing partial reductions without the change. I stripped the changes from the diff, only the previously crashing case has been added now, thanks

Move test to new file, to prepare for adding similar tests in #162822.

…separate file. Move test to new file, to prepare for adding similar tests in llvm/llvm-project#162822.

github-actions · 2025-10-21T13:24:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Move test to new file, to prepare for adding similar tests in llvm#162822.

fhahn

ping

fhahn requested review from SamTebbs33, lukel97 and sdesmalen-arm October 10, 2025 10:15

llvmbot added vectorizers llvm:transforms labels Oct 10, 2025

artagnon reviewed Oct 10, 2025

View reviewed changes

david-arm reviewed Oct 13, 2025

View reviewed changes

sdesmalen-arm reviewed Oct 13, 2025

View reviewed changes

SamTebbs33 mentioned this pull request Oct 13, 2025

[LV] Bundle partial reductions inside VPExpressionRecipe #147302

Merged

SamTebbs33 reviewed Oct 14, 2025

View reviewed changes

fhahn added a commit that referenced this pull request Oct 19, 2025

[LV] Move test for incomplete partial reduction chains to separate file.

4454157

Move test to new file, to prepare for adding similar tests in #162822.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 19, 2025

Automerge: [LV] Move test for incomplete partial reduction chains to …

6a6625b

…separate file. Move test to new file, to prepare for adding similar tests in llvm/llvm-project#162822.

fhahn added 2 commits October 21, 2025 12:02

Merge remote-tracking branch 'origin/main' into lv-scaled-fix-crash

d0cdf4b

!fixup adjust lambda, thanks

dae8277

fhahn added 3 commits October 22, 2025 10:04

Merge remote-tracking branch 'origin/main' into lv-scaled-fix-crash

cf627d8

!fixup fix formatting

dae6c1a

Merge remote-tracking branch 'origin/main' into lv-scaled-fix-crash

a13beca

Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025

[LV] Move test for incomplete partial reduction chains to separate file.

8882366

Move test to new file, to prepare for adding similar tests in llvm#162822.

aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025

[LV] Move test for incomplete partial reduction chains to separate file.

b469d1d

Move test to new file, to prepare for adding similar tests in llvm#162822.

Merge remote-tracking branch 'origin/main' into lv-scaled-fix-crash

6a88f16

fhahn commented Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Check all users of partial reductions in chain have same scale. #162822

[LV] Check all users of partial reductions in chain have same scale. #162822

fhahn commented Oct 10, 2025

llvmbot commented Oct 10, 2025 •

edited

Loading

artagnon Oct 10, 2025

david-arm Oct 13, 2025

fhahn Oct 21, 2025

david-arm Oct 13, 2025

david-arm Oct 13, 2025

fhahn Oct 21, 2025

sdesmalen-arm Oct 13, 2025

fhahn Oct 21, 2025

sdesmalen-arm Oct 13, 2025

sdesmalen-arm Oct 13, 2025

fhahn Oct 21, 2025

sdesmalen-arm Oct 13, 2025

fhahn Oct 21, 2025

SamTebbs33 Oct 14, 2025

fhahn Oct 21, 2025

github-actions bot commented Oct 21, 2025 •

edited

Loading

fhahn left a comment

Labels

6 participants

		@@ -0,0 +1,112 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --filter-out-after "^scalar.ph:" --version 4

		auto AllUsersPartialRdx = [this](Instruction *I, unsigned Scale) {
		return all_of(I->users(), [Scale, this](const User *U) {


		target triple = "arm64-apple-macosx"

		define i32 @red_extended_add_incomplete_chain(ptr %start, ptr %end, i32 %offset) {

[LV] Check all users of partial reductions in chain have same scale. #162822

Are you sure you want to change the base?

[LV] Check all users of partial reductions in chain have same scale. #162822

Conversation

fhahn commented Oct 10, 2025

llvmbot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Labels

6 participants

llvmbot commented Oct 10, 2025 •

edited

Loading

github-actions bot commented Oct 21, 2025 •

edited

Loading