[MLIR] Improve in-place folding to iterate until fixed-point #160615

joker-eph · 2025-09-24T22:27:11Z

When executed in the context of canonicalization, the folders are invoked in a fixed-point iterative process. However in the context of an API like createOrFold() or in DialectConversion for example, we expect a "one-shot" call to fold to be as "folded" as possible. However, even when folders themselves are indempotent, folders on a given operation interact with each other. For example:

// X = 0 + Y
%X = arith.addi %c_0, %Y : i32

should fold to %Y, but the process actually involves first the folder provided by the IsCommutative trait to move the constant to the right. However this happens after attempting to fold the operation and the operation folder isn't attempt again after applying the trait folder.

This commit makes sure we iterate until fixed point on folder applications.

Fixes #159844

llvmbot · 2025-09-24T22:27:46Z

@llvm/pr-subscribers-llvm-adt
@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Author: Mehdi Amini (joker-eph)

Changes

When executed in the context of canonicalization, the folders are invoked in a fixed-point iterative process. However in the context of an API like createOrFold() or in DialectConversion for example, we expect a "one-shot" call to fold to be as "folded" as possible. However, even when folders themselves are indempotent, folders on a given operation interact with each other. For example:

// X = 0 + Y
%X = arith.addi %c_0, %Y : i32

should fold to %Y, but the process actually involves first the folder provided by the IsCommutative trait to move the constant to the right. However this happens after attempting to fold the operation and the operation folder isn't attempt again after applying the trait folder.

This commit makes sure we iterate until fixed point on folder applications.

Full diff: https://github.com/llvm/llvm-project/pull/160615.diff

6 Files Affected:

(modified) mlir/lib/IR/Builders.cpp (+7)
(modified) mlir/lib/Transforms/Utils/FoldUtils.cpp (+10-2)
(modified) mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir (+2-6)
(modified) mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir (+2-4)
(modified) mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir (+7-13)
(modified) mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir (+5-11)

diff --git a/mlir/lib/IR/Builders.cpp b/mlir/lib/IR/Builders.cpp index 3d366276b4375..6373b1ead94d1 100644 --- a/mlir/lib/IR/Builders.cpp +++ b/mlir/lib/IR/Builders.cpp @@ -14,6 +14,7 @@ #include "mlir/IR/IRMapping.h" #include "mlir/IR/Matchers.h" #include "llvm/ADT/SmallVectorExtras.h" +#include "llvm/Support/DebugLog.h" using namespace mlir; @@ -489,6 +490,12 @@ OpBuilder::tryFold(Operation *op, SmallVectorImpl<Value> &results, if (failed(op->fold(foldResults))) return cleanupFailure(); + int count = 0; + do { + LDBG() << "Folded in place #" << count++ + << " times: " << OpWithFlags(op, OpPrintingFlags().skipRegions()); + } while (foldResults.empty() && succeeded(op->fold(foldResults))); + // An in-place fold does not require generation of any constants. if (foldResults.empty()) return success(); diff --git a/mlir/lib/Transforms/Utils/FoldUtils.cpp b/mlir/lib/Transforms/Utils/FoldUtils.cpp index 5e07509871ea2..70310685315c4 100644 --- a/mlir/lib/Transforms/Utils/FoldUtils.cpp +++ b/mlir/lib/Transforms/Utils/FoldUtils.cpp @@ -16,6 +16,7 @@ #include "mlir/IR/Builders.h" #include "mlir/IR/Matchers.h" #include "mlir/IR/Operation.h" +#include "llvm/Support/DebugLog.h" using namespace mlir; @@ -226,8 +227,15 @@ bool OperationFolder::isFolderOwnedConstant(Operation *op) const { LogicalResult OperationFolder::tryToFold(Operation *op, SmallVectorImpl<Value> &results) { SmallVector<OpFoldResult, 8> foldResults; - if (failed(op->fold(foldResults)) || - failed(processFoldResults(op, results, foldResults))) + if (failed(op->fold(foldResults))) + return failure(); + int count = 1; + do { + LDBG() << "Folded in place #" << count++ + << " times: " << OpWithFlags(op, OpPrintingFlags().skipRegions()); + } while (foldResults.empty() && succeeded(op->fold(foldResults))); + + if (failed(processFoldResults(op, results, foldResults))) return failure(); return success(); } diff --git a/mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir b/mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir index 547c7355e00c6..b73bc69393dab 100644 --- a/mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir +++ b/mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir @@ -7,10 +7,8 @@ gpu.module @test { //CHECK: [[IDY:%.+]] = affine.apply #map()[[[sgId]]] //CHECK: [[c32:%.+]] = arith.constant 32 : index //CHECK: [[LOCALY:%.+]] = index.mul [[IDY]], [[c32]] - //CHECK: [[c0:%.+]] = arith.constant 0 : index - //CHECK: [[Y:%.+]] = arith.addi [[LOCALY]], [[c0]] : index //CHECK: [[c128:%.+]] = arith.constant 128 : index - //CHECK: [[MODY:%.+]] = index.remu [[Y]], [[c128]] + //CHECK: [[MODY:%.+]] = index.remu [[LOCALY]], [[c128]] //CHECK: [[BASE:%.+]] = vector.step : vector<32xindex> //CHECK: [[CAST:%.+]] = vector.broadcast [[MODY]] : index to vector<32xindex> //CHECK: [[ADD:%.+]] = arith.addi [[BASE]], [[CAST]] : vector<32xindex> @@ -23,10 +21,8 @@ gpu.module @test { //CHECK: [[IDY:%.+]] = affine.apply #map()[[[sgId]]] //CHECK: [[c32:%.+]] = arith.constant 32 : index //CHECK: [[LOCALY:%.+]] = index.mul [[IDY]], [[c32]] - //CHECK: [[c0:%.+]] = arith.constant 0 : index - //CHECK: [[Y:%.+]] = arith.addi [[LOCALY]], [[c0]] : index //CHECK: [[c128:%.+]] = arith.constant 128 : index - //CHECK: [[MODY:%.+]] = index.remu [[Y]], [[c128]] + //CHECK: [[MODY:%.+]] = index.remu [[LOCALY]], [[c128]] //CHECK: [[BASE:%.+]] = vector.step : vector<32xindex> //CHECK: [[CAST:%.+]] = vector.broadcast [[MODY]] : index to vector<32xindex> //CHECK: [[ADD:%.+]] = arith.addi [[BASE]], [[CAST]] : vector<32xindex> diff --git a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir index e5cc65e6bd3d7..d2d250cbe0f66 100644 --- a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir +++ b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir @@ -27,12 +27,10 @@ gpu.module @test_round_robin_assignment { //CHECK: [[LX:%.+]] = index.mul [[IdX]], [[C64]] //CHECK: [[C0:%.+]] = arith.constant 0 : index //CHECK: [[C0_1:%.+]] = arith.constant 0 : index - //CHECK: [[ADDY:%.+]] = arith.addi [[LY]], [[C0]] : index - //CHECK: [[ADDX:%.+]] = arith.addi [[LX]], [[C0_1]] : index //CHECK: [[C128:%.+]] = arith.constant 128 : index - //CHECK: [[offY:%.+]] = index.remu [[ADDY]], [[C128]] + //CHECK: [[offY:%.+]] = index.remu [[LY]], [[C128]] //CHECK: [[C64_2:%.+]] = arith.constant 64 : index - //CHECK: [[offX:%.+]] = index.remu [[ADDX]], [[C64_2]] + //CHECK: [[offX:%.+]] = index.remu [[LX]], [[C64_2]] //CHECK: xegpu.create_nd_tdesc [[ARG_0]][[[offY]], [[offX]]] : memref<256x128xf32> -> !xegpu.tensor_desc<16x64xf32> %tdesc = xegpu.create_nd_tdesc %src[0, 0] : memref<256x128xf32> -> !xegpu.tensor_desc<128x64xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [16, 64]>> diff --git a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir index 3478a9b91da5f..31d015d89a2dc 100644 --- a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir +++ b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir @@ -325,12 +325,10 @@ gpu.module @test_distribution { //CHECK: [[l_off_x:%.+]] = index.mul [[id_x]], [[c32_1]] //CHECK: [[c0:%.+]] = arith.constant 0 : index //CHECK: [[c0_1:%.+]] = arith.constant 0 : index - //CHECK: [[l_off_y_0:%.+]] = arith.addi [[l_off_y]], [[c0]] : index - //CHECK: [[l_off_x_0:%.+]] = arith.addi [[l_off_x]], [[c0_1]] : index //CHECK: [[c64:%.+]] = arith.constant 64 : index - //CHECK: [[off_y:%.+]] = index.remu [[l_off_y_0]], [[c64]] + //CHECK: [[off_y:%.+]] = index.remu [[l_off_y]], [[c64]] //CHECK: [[c128:%.+]] = arith.constant 128 : index - //CHECK: [[off_x:%.+]] = index.remu [[l_off_x_0]], [[c128]] + //CHECK: [[off_x:%.+]] = index.remu [[l_off_x]], [[c128]] //CHECK: xegpu.load_matrix [[mdesc]][[[off_y]], [[off_x]]] <{layout = #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>}>: !xegpu.mem_desc<64x128xf32>, index, index -> vector<32x32xf32> %0 = xegpu.create_mem_desc %arg0 : memref<32768xi8, 3> -> !xegpu.mem_desc<64x128xf32> %1 = xegpu.load_matrix %0[0, 0] <{layout = #xegpu.layout<sg_layout = [2, 4], sg_data = [32, 32], lane_layout = [2, 8], lane_data = [1, 1]>}>: !xegpu.mem_desc<64x128xf32> -> vector<64x128xf32> @@ -349,13 +347,11 @@ gpu.module @test_distribution { //CHECK: [[id_y:%.+]] = affine.apply #map()[[[sgid]]] //CHECK: [[id_x:%.+]] = affine.apply #map1()[[[sgid]]] //CHECK: [[c32:%.+]] = arith.constant 32 : index - //CHECK: [[l_off_y_0:%.+]] = index.mul [[id_y]], [[c32]] + //CHECK: [[l_off_y:%.+]] = index.mul [[id_y]], [[c32]] //CHECK: [[c32_1:%.+]] = arith.constant 32 : index - //CHECK: [[l_off_x_0:%.+]] = index.mul [[id_x]], [[c32_1]] + //CHECK: [[l_off_x:%.+]] = index.mul [[id_x]], [[c32_1]] //CHECK: [[c0:%.+]] = arith.constant 0 : index //CHECK: [[c0_2:%.+]] = arith.constant 0 : index - //CHECK: [[l_off_y:%.+]] = arith.addi [[l_off_y_0]], [[c0]] : index - //CHECK: [[l_off_x:%.+]] = arith.addi [[l_off_x_0]], [[c0_2]] : index //CHECK: [[c64:%.+]] = arith.constant 64 : index //CHECK: [[off_y:%.+]] = index.remu [[l_off_y]], [[c64]] //CHECK: [[c128:%.+]] = arith.constant 128 : index @@ -372,11 +368,10 @@ gpu.module @test_distribution { //CHECK: [[sgId:%.+]] = gpu.subgroup_id : index //CHECK-DAG: [[IDY:%.+]] = affine.apply #map2()[[[sgId]]] //CHECK-DAG: [[c32:%.+]] = arith.constant 32 : index - //CHECK-DAG: [[LOCALY:%.+]] = index.mul [[IDY]], [[c32]] + //CHECK-DAG: [[LY:%.+]] = index.mul [[IDY]], [[c32]] //CHECK-DAG: [[c0:%.+]] = arith.constant 0 : index - //CHECK-DAG: [[Y:%.+]] = arith.addi [[LOCALY]], [[c0]] : index //CHECK-DAG: [[c128:%.+]] = arith.constant 128 : index - //CHECK-DAG: [[MODY:%.+]] = index.remu [[Y]], [[c128]] + //CHECK-DAG: [[MODY:%.+]] = index.remu [[LY]], [[c128]] //CHECK-DAG: [[BASE:%.+]] = vector.step : vector<32xindex> //CHECK-DAG: [[CAST:%.+]] = vector.broadcast [[MODY]] : index to vector<32xindex> //CHECK: [[ADD:%.+]] = arith.addi [[BASE]], [[CAST]] : vector<32xindex> @@ -390,9 +385,8 @@ gpu.module @test_distribution { //CHECK-DAG: [[c8:%.+]] = arith.constant 8 : index //CHECK-DAG: [[LOCALY:%.+]] = index.mul [[sgId]], [[c8]] //CHECK-DAG: [[c0:%.+]] = arith.constant 0 : index - //CHECK-DAG: [[Y:%.+]] = arith.addi [[LOCALY]], [[c0]] : index //CHECK-DAG: [[c128:%.+]] = arith.constant 128 : index - //CHECK-DAG: [[MODY:%.+]] = index.remu [[Y]], [[c128]] + //CHECK-DAG: [[MODY:%.+]] = index.remu [[LOCALY]], [[c128]] //CHECK-DAG: [[BASE:%.+]] = vector.step : vector<8xindex> //CHECK-DAG: [[CAST:%.+]] = vector.broadcast [[MODY]] : index to vector<8xindex> //CHECK: [[ADD:%.+]] = arith.addi [[BASE]], [[CAST]] : vector<8xindex> diff --git a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir index c0fb373835e3d..e83229e3a3995 100644 --- a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir +++ b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir @@ -14,12 +14,10 @@ gpu.module @test_1_1_assignment { //CHECK: [[LX:%.+]] = index.mul [[SGIDX]], [[C32]] //CHECK: [[C0:%.+]] = arith.constant 0 : index //CHECK: [[C0_1:%.+]] = arith.constant 0 : index - //CHECK: [[UY:%.+]] = arith.addi [[LY]], [[C0]] : index - //CHECK: [[UX:%.+]] = arith.addi [[LX]], [[C0_1]] : index //CHECK: [[C256:%.+]] = arith.constant 256 : index - //CHECK: [[Y:%.+]] = index.remu [[UY]], [[C256]] + //CHECK: [[Y:%.+]] = index.remu [[LY]], [[C256]] //CHECK: [[C128:%.+]] = arith.constant 128 : index - //CHECK: [[X:%.+]] = index.remu [[UX]], [[C128]] + //CHECK: [[X:%.+]] = index.remu [[LX]], [[C128]] //CHECK: [[TDESC:%.+]] = xegpu.create_nd_tdesc [[ARG_0]][[[Y]], [[X]]] : memref<256x128xf32> -> !xegpu.tensor_desc<32x32xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>> %tdesc = xegpu.create_nd_tdesc %src[0, 0] : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32], lane_layout = [1, 16], lane_data = [1, 1]>> @@ -37,17 +35,13 @@ gpu.module @test_1_1_assignment { //CHECK: [[LX:%.+]] = index.mul [[SGIDX]], [[C32]] //CHECK: [[C0:%.+]] = arith.constant 0 : index //CHECK: [[C0_2:%.+]] = arith.constant 0 : index - //CHECK: [[UY:%.+]] = arith.addi [[LY]], [[C0]] : index - //CHECK: [[UX:%.+]] = arith.addi [[LX]], [[C0_2]] : index //CHECK: [[C256:%.+]] = arith.constant 256 : index - //CHECK: [[MODY:%.+]] = index.remu [[UY]], [[C256]] + //CHECK: [[MODY:%.+]] = index.remu [[LY]], [[C256]] //CHECK: [[C128:%.+]] = arith.constant 128 : index - //CHECK: [[MODX:%.+]] = index.remu [[UX]], [[C128]] + //CHECK: [[MODX:%.+]] = index.remu [[LX]], [[C128]] //CHECK: [[C0_3:%.+]] = arith.constant 0 : index - //CHECK: [[Y:%.+]] = index.add [[MODY]], [[C0_3]] //CHECK: [[C0_4:%.+]] = arith.constant 0 : index - //CHECK: [[X:%.+]] = index.add [[MODX]], [[C0_4]] - //CHECK: [[TDESC:%.+]] = xegpu.create_nd_tdesc [[ARG_0]][1, [[Y]], [[X]]] : memref<3x256x128xf32> -> !xegpu.tensor_desc<32x32xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>> + //CHECK: [[TDESC:%.+]] = xegpu.create_nd_tdesc [[ARG_0]][1, [[MODY]], [[MODX]]] : memref<3x256x128xf32> -> !xegpu.tensor_desc<32x32xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>> %tdesc = xegpu.create_nd_tdesc %src[1, 0, 0] : memref<3x256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32], lane_layout = [1, 16], lane_data = [1, 1]>> gpu.return

llvmbot · 2025-09-24T22:27:46Z

@llvm/pr-subscribers-mlir-gpu

Author: Mehdi Amini (joker-eph)

Changes

When executed in the context of canonicalization, the folders are invoked in a fixed-point iterative process. However in the context of an API like createOrFold() or in DialectConversion for example, we expect a "one-shot" call to fold to be as "folded" as possible. However, even when folders themselves are indempotent, folders on a given operation interact with each other. For example:

// X = 0 + Y
%X = arith.addi %c_0, %Y : i32

should fold to %Y, but the process actually involves first the folder provided by the IsCommutative trait to move the constant to the right. However this happens after attempting to fold the operation and the operation folder isn't attempt again after applying the trait folder.

This commit makes sure we iterate until fixed point on folder applications.

Full diff: https://github.com/llvm/llvm-project/pull/160615.diff

6 Files Affected:

(modified) mlir/lib/IR/Builders.cpp (+7)
(modified) mlir/lib/Transforms/Utils/FoldUtils.cpp (+10-2)
(modified) mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir (+2-6)
(modified) mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir (+2-4)
(modified) mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir (+7-13)
(modified) mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir (+5-11)

diff --git a/mlir/lib/IR/Builders.cpp b/mlir/lib/IR/Builders.cpp index 3d366276b4375..6373b1ead94d1 100644 --- a/mlir/lib/IR/Builders.cpp +++ b/mlir/lib/IR/Builders.cpp @@ -14,6 +14,7 @@ #include "mlir/IR/IRMapping.h" #include "mlir/IR/Matchers.h" #include "llvm/ADT/SmallVectorExtras.h" +#include "llvm/Support/DebugLog.h" using namespace mlir; @@ -489,6 +490,12 @@ OpBuilder::tryFold(Operation *op, SmallVectorImpl<Value> &results, if (failed(op->fold(foldResults))) return cleanupFailure(); + int count = 0; + do { + LDBG() << "Folded in place #" << count++ + << " times: " << OpWithFlags(op, OpPrintingFlags().skipRegions()); + } while (foldResults.empty() && succeeded(op->fold(foldResults))); + // An in-place fold does not require generation of any constants. if (foldResults.empty()) return success(); diff --git a/mlir/lib/Transforms/Utils/FoldUtils.cpp b/mlir/lib/Transforms/Utils/FoldUtils.cpp index 5e07509871ea2..70310685315c4 100644 --- a/mlir/lib/Transforms/Utils/FoldUtils.cpp +++ b/mlir/lib/Transforms/Utils/FoldUtils.cpp @@ -16,6 +16,7 @@ #include "mlir/IR/Builders.h" #include "mlir/IR/Matchers.h" #include "mlir/IR/Operation.h" +#include "llvm/Support/DebugLog.h" using namespace mlir; @@ -226,8 +227,15 @@ bool OperationFolder::isFolderOwnedConstant(Operation *op) const { LogicalResult OperationFolder::tryToFold(Operation *op, SmallVectorImpl<Value> &results) { SmallVector<OpFoldResult, 8> foldResults; - if (failed(op->fold(foldResults)) || - failed(processFoldResults(op, results, foldResults))) + if (failed(op->fold(foldResults))) + return failure(); + int count = 1; + do { + LDBG() << "Folded in place #" << count++ + << " times: " << OpWithFlags(op, OpPrintingFlags().skipRegions()); + } while (foldResults.empty() && succeeded(op->fold(foldResults))); + + if (failed(processFoldResults(op, results, foldResults))) return failure(); return success(); } diff --git a/mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir b/mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir index 547c7355e00c6..b73bc69393dab 100644 --- a/mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir +++ b/mlir/test/Dialect/XeGPU/xegpu-attr-interface.mlir @@ -7,10 +7,8 @@ gpu.module @test { //CHECK: [[IDY:%.+]] = affine.apply #map()[[[sgId]]] //CHECK: [[c32:%.+]] = arith.constant 32 : index //CHECK: [[LOCALY:%.+]] = index.mul [[IDY]], [[c32]] - //CHECK: [[c0:%.+]] = arith.constant 0 : index - //CHECK: [[Y:%.+]] = arith.addi [[LOCALY]], [[c0]] : index //CHECK: [[c128:%.+]] = arith.constant 128 : index - //CHECK: [[MODY:%.+]] = index.remu [[Y]], [[c128]] + //CHECK: [[MODY:%.+]] = index.remu [[LOCALY]], [[c128]] //CHECK: [[BASE:%.+]] = vector.step : vector<32xindex> //CHECK: [[CAST:%.+]] = vector.broadcast [[MODY]] : index to vector<32xindex> //CHECK: [[ADD:%.+]] = arith.addi [[BASE]], [[CAST]] : vector<32xindex> @@ -23,10 +21,8 @@ gpu.module @test { //CHECK: [[IDY:%.+]] = affine.apply #map()[[[sgId]]] //CHECK: [[c32:%.+]] = arith.constant 32 : index //CHECK: [[LOCALY:%.+]] = index.mul [[IDY]], [[c32]] - //CHECK: [[c0:%.+]] = arith.constant 0 : index - //CHECK: [[Y:%.+]] = arith.addi [[LOCALY]], [[c0]] : index //CHECK: [[c128:%.+]] = arith.constant 128 : index - //CHECK: [[MODY:%.+]] = index.remu [[Y]], [[c128]] + //CHECK: [[MODY:%.+]] = index.remu [[LOCALY]], [[c128]] //CHECK: [[BASE:%.+]] = vector.step : vector<32xindex> //CHECK: [[CAST:%.+]] = vector.broadcast [[MODY]] : index to vector<32xindex> //CHECK: [[ADD:%.+]] = arith.addi [[BASE]], [[CAST]] : vector<32xindex> diff --git a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir index e5cc65e6bd3d7..d2d250cbe0f66 100644 --- a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir +++ b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir @@ -27,12 +27,10 @@ gpu.module @test_round_robin_assignment { //CHECK: [[LX:%.+]] = index.mul [[IdX]], [[C64]] //CHECK: [[C0:%.+]] = arith.constant 0 : index //CHECK: [[C0_1:%.+]] = arith.constant 0 : index - //CHECK: [[ADDY:%.+]] = arith.addi [[LY]], [[C0]] : index - //CHECK: [[ADDX:%.+]] = arith.addi [[LX]], [[C0_1]] : index //CHECK: [[C128:%.+]] = arith.constant 128 : index - //CHECK: [[offY:%.+]] = index.remu [[ADDY]], [[C128]] + //CHECK: [[offY:%.+]] = index.remu [[LY]], [[C128]] //CHECK: [[C64_2:%.+]] = arith.constant 64 : index - //CHECK: [[offX:%.+]] = index.remu [[ADDX]], [[C64_2]] + //CHECK: [[offX:%.+]] = index.remu [[LX]], [[C64_2]] //CHECK: xegpu.create_nd_tdesc [[ARG_0]][[[offY]], [[offX]]] : memref<256x128xf32> -> !xegpu.tensor_desc<16x64xf32> %tdesc = xegpu.create_nd_tdesc %src[0, 0] : memref<256x128xf32> -> !xegpu.tensor_desc<128x64xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [16, 64]>> diff --git a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir index 3478a9b91da5f..31d015d89a2dc 100644 --- a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir +++ b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir @@ -325,12 +325,10 @@ gpu.module @test_distribution { //CHECK: [[l_off_x:%.+]] = index.mul [[id_x]], [[c32_1]] //CHECK: [[c0:%.+]] = arith.constant 0 : index //CHECK: [[c0_1:%.+]] = arith.constant 0 : index - //CHECK: [[l_off_y_0:%.+]] = arith.addi [[l_off_y]], [[c0]] : index - //CHECK: [[l_off_x_0:%.+]] = arith.addi [[l_off_x]], [[c0_1]] : index //CHECK: [[c64:%.+]] = arith.constant 64 : index - //CHECK: [[off_y:%.+]] = index.remu [[l_off_y_0]], [[c64]] + //CHECK: [[off_y:%.+]] = index.remu [[l_off_y]], [[c64]] //CHECK: [[c128:%.+]] = arith.constant 128 : index - //CHECK: [[off_x:%.+]] = index.remu [[l_off_x_0]], [[c128]] + //CHECK: [[off_x:%.+]] = index.remu [[l_off_x]], [[c128]] //CHECK: xegpu.load_matrix [[mdesc]][[[off_y]], [[off_x]]] <{layout = #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>}>: !xegpu.mem_desc<64x128xf32>, index, index -> vector<32x32xf32> %0 = xegpu.create_mem_desc %arg0 : memref<32768xi8, 3> -> !xegpu.mem_desc<64x128xf32> %1 = xegpu.load_matrix %0[0, 0] <{layout = #xegpu.layout<sg_layout = [2, 4], sg_data = [32, 32], lane_layout = [2, 8], lane_data = [1, 1]>}>: !xegpu.mem_desc<64x128xf32> -> vector<64x128xf32> @@ -349,13 +347,11 @@ gpu.module @test_distribution { //CHECK: [[id_y:%.+]] = affine.apply #map()[[[sgid]]] //CHECK: [[id_x:%.+]] = affine.apply #map1()[[[sgid]]] //CHECK: [[c32:%.+]] = arith.constant 32 : index - //CHECK: [[l_off_y_0:%.+]] = index.mul [[id_y]], [[c32]] + //CHECK: [[l_off_y:%.+]] = index.mul [[id_y]], [[c32]] //CHECK: [[c32_1:%.+]] = arith.constant 32 : index - //CHECK: [[l_off_x_0:%.+]] = index.mul [[id_x]], [[c32_1]] + //CHECK: [[l_off_x:%.+]] = index.mul [[id_x]], [[c32_1]] //CHECK: [[c0:%.+]] = arith.constant 0 : index //CHECK: [[c0_2:%.+]] = arith.constant 0 : index - //CHECK: [[l_off_y:%.+]] = arith.addi [[l_off_y_0]], [[c0]] : index - //CHECK: [[l_off_x:%.+]] = arith.addi [[l_off_x_0]], [[c0_2]] : index //CHECK: [[c64:%.+]] = arith.constant 64 : index //CHECK: [[off_y:%.+]] = index.remu [[l_off_y]], [[c64]] //CHECK: [[c128:%.+]] = arith.constant 128 : index @@ -372,11 +368,10 @@ gpu.module @test_distribution { //CHECK: [[sgId:%.+]] = gpu.subgroup_id : index //CHECK-DAG: [[IDY:%.+]] = affine.apply #map2()[[[sgId]]] //CHECK-DAG: [[c32:%.+]] = arith.constant 32 : index - //CHECK-DAG: [[LOCALY:%.+]] = index.mul [[IDY]], [[c32]] + //CHECK-DAG: [[LY:%.+]] = index.mul [[IDY]], [[c32]] //CHECK-DAG: [[c0:%.+]] = arith.constant 0 : index - //CHECK-DAG: [[Y:%.+]] = arith.addi [[LOCALY]], [[c0]] : index //CHECK-DAG: [[c128:%.+]] = arith.constant 128 : index - //CHECK-DAG: [[MODY:%.+]] = index.remu [[Y]], [[c128]] + //CHECK-DAG: [[MODY:%.+]] = index.remu [[LY]], [[c128]] //CHECK-DAG: [[BASE:%.+]] = vector.step : vector<32xindex> //CHECK-DAG: [[CAST:%.+]] = vector.broadcast [[MODY]] : index to vector<32xindex> //CHECK: [[ADD:%.+]] = arith.addi [[BASE]], [[CAST]] : vector<32xindex> @@ -390,9 +385,8 @@ gpu.module @test_distribution { //CHECK-DAG: [[c8:%.+]] = arith.constant 8 : index //CHECK-DAG: [[LOCALY:%.+]] = index.mul [[sgId]], [[c8]] //CHECK-DAG: [[c0:%.+]] = arith.constant 0 : index - //CHECK-DAG: [[Y:%.+]] = arith.addi [[LOCALY]], [[c0]] : index //CHECK-DAG: [[c128:%.+]] = arith.constant 128 : index - //CHECK-DAG: [[MODY:%.+]] = index.remu [[Y]], [[c128]] + //CHECK-DAG: [[MODY:%.+]] = index.remu [[LOCALY]], [[c128]] //CHECK-DAG: [[BASE:%.+]] = vector.step : vector<8xindex> //CHECK-DAG: [[CAST:%.+]] = vector.broadcast [[MODY]] : index to vector<8xindex> //CHECK: [[ADD:%.+]] = arith.addi [[BASE]], [[CAST]] : vector<8xindex> diff --git a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir index c0fb373835e3d..e83229e3a3995 100644 --- a/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir +++ b/mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir @@ -14,12 +14,10 @@ gpu.module @test_1_1_assignment { //CHECK: [[LX:%.+]] = index.mul [[SGIDX]], [[C32]] //CHECK: [[C0:%.+]] = arith.constant 0 : index //CHECK: [[C0_1:%.+]] = arith.constant 0 : index - //CHECK: [[UY:%.+]] = arith.addi [[LY]], [[C0]] : index - //CHECK: [[UX:%.+]] = arith.addi [[LX]], [[C0_1]] : index //CHECK: [[C256:%.+]] = arith.constant 256 : index - //CHECK: [[Y:%.+]] = index.remu [[UY]], [[C256]] + //CHECK: [[Y:%.+]] = index.remu [[LY]], [[C256]] //CHECK: [[C128:%.+]] = arith.constant 128 : index - //CHECK: [[X:%.+]] = index.remu [[UX]], [[C128]] + //CHECK: [[X:%.+]] = index.remu [[LX]], [[C128]] //CHECK: [[TDESC:%.+]] = xegpu.create_nd_tdesc [[ARG_0]][[[Y]], [[X]]] : memref<256x128xf32> -> !xegpu.tensor_desc<32x32xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>> %tdesc = xegpu.create_nd_tdesc %src[0, 0] : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32], lane_layout = [1, 16], lane_data = [1, 1]>> @@ -37,17 +35,13 @@ gpu.module @test_1_1_assignment { //CHECK: [[LX:%.+]] = index.mul [[SGIDX]], [[C32]] //CHECK: [[C0:%.+]] = arith.constant 0 : index //CHECK: [[C0_2:%.+]] = arith.constant 0 : index - //CHECK: [[UY:%.+]] = arith.addi [[LY]], [[C0]] : index - //CHECK: [[UX:%.+]] = arith.addi [[LX]], [[C0_2]] : index //CHECK: [[C256:%.+]] = arith.constant 256 : index - //CHECK: [[MODY:%.+]] = index.remu [[UY]], [[C256]] + //CHECK: [[MODY:%.+]] = index.remu [[LY]], [[C256]] //CHECK: [[C128:%.+]] = arith.constant 128 : index - //CHECK: [[MODX:%.+]] = index.remu [[UX]], [[C128]] + //CHECK: [[MODX:%.+]] = index.remu [[LX]], [[C128]] //CHECK: [[C0_3:%.+]] = arith.constant 0 : index - //CHECK: [[Y:%.+]] = index.add [[MODY]], [[C0_3]] //CHECK: [[C0_4:%.+]] = arith.constant 0 : index - //CHECK: [[X:%.+]] = index.add [[MODX]], [[C0_4]] - //CHECK: [[TDESC:%.+]] = xegpu.create_nd_tdesc [[ARG_0]][1, [[Y]], [[X]]] : memref<3x256x128xf32> -> !xegpu.tensor_desc<32x32xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>> + //CHECK: [[TDESC:%.+]] = xegpu.create_nd_tdesc [[ARG_0]][1, [[MODY]], [[MODX]]] : memref<3x256x128xf32> -> !xegpu.tensor_desc<32x32xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>> %tdesc = xegpu.create_nd_tdesc %src[1, 0, 0] : memref<3x256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32], lane_layout = [1, 16], lane_data = [1, 1]>> gpu.return

kuhar

This makes sense to me. Do you have some statistics on the usual number of iterations and if this adds measurable overhead?

mlir/test/Dialect/Arith/constant-fold.mlir

llvm/include/llvm/ADT/BitVector.h

joker-eph · 2025-09-24T22:38:02Z

This makes sense to me. Do you have some statistics on the usual number of iterations and if this adds measurable overhead?

This necessary adds some overhead, I don't have stats though (it's gonna be heavily dependent on the setup: how the folders are structured and what the input IR looks like).

For the number of iterations: should be 1/2 in general, but that's assuming that folders are well written (which we would like to because this is already iterative in canonicalization).

kuhar · 2025-09-24T22:46:06Z

This necessary adds some overhead, I don't have stats though (it's gonna be heavily dependent on the setup: how the folders are structured and what the input IR looks like).

I can cherry pick this and check by compiling something like llama/SDXL in IREE. I should be able to find time for this in the next couple of days.

joker-eph · 2025-09-24T22:56:38Z

FYI I added assertions for assert(count < 3); in the iteration and that hits only once upstream, a folding pattern on spirv.func apparently.

joker-eph · 2025-09-24T23:01:45Z

Actually it was vector.extract, but in the context of legalizing to spirv:

[dialect-conversion:1] Legalizing operation : 'vector.extract' (0x560f502f5100) { [dialect-conversion:1] %14 = "vector.extract"(%13, %1) <{static_position = array<i64: -9223372036854775808>}> : (vector<4xi32>, index) -> i32 [dialect-conversion:1] * Fold { [Builders.cpp:496 1] Folded in place #1 times: %14 = "vector.extract"(%11) <{static_position = array<i64: 1>}> : (vector<2xi32>) -> i32 [Builders.cpp:496 1] Folded in place #2 times: %14 = "vector.extract"(%3) <{static_position = array<i64: 1>}> : (vector<2xi32>) -> i32 [Builders.cpp:496 1] Folded in place #3 times: %14 = "vector.extract"(%3) <{static_position = array<i64: 1>}> : (vector<2xi32>) -> i32

Seems like first it folds a dynamic to static, then it folds again by looking through the operand somehow?

matthias-springer

This change makes sense to me.

The same strategy could potentially be used in the greedy pattern rewrite driver to improve efficiency: we could try to fold the op in a loop until no more folding is possible. This may be more efficient than pushing/popping the op onto the worklist. (We still have to put the op on the worklist after the last successful folding, so it's unclear if this actually improves compilation time.)

Groverkss

Nice! I always just assumed this happened for some reason.

dcaballe · 2025-09-26T15:59:45Z

Nice!

we expect a "one-shot" call to fold to be as "folded" as possible.

I'm curious, how could we verify now that a particular folder fully applies in one iteration? As you know, we previously had ordering issues in the vector.extract folder that prevented it from folding completely in one iteration, even though it should have. Would it make sense to pass the maximum number of iterations as a parameter somehow so that test passes could explicitly set it to one, or something like that?

When executed in the context of canonicalization, the folders are invoked in a fixed-point iterative process. However in the context of an API like `createOrFold()` or in DialectConversion for example, we expect a "one-shot" call to fold to be as "folded" as possible. However, even when folders themselves are indempotent, folders on a given operation interact with each other. For example: // X = 0 + Y %X = arith.addi %c_0, %Y : i32 should fold to %Y, but the process actually involves first the folder provided by the IsCommutative trait to move the constant to the right. However this happens after attempting to fold the operation and the operation folder isn't attempt again after applying the trait folder. This commit makes sure we iterate until fixed point on folder applications.

joker-eph · 2025-09-26T17:29:08Z

@dcaballe : I added a "maxIterations" to the API, and an option to the test pass.
That way the pass has the same default behavior as before (it checks a single iteration of the folder), as shown in the newly added test. PTAL!

joker-eph · 2025-09-26T21:15:27Z

This necessary adds some overhead, I don't have stats though (it's gonna be heavily dependent on the setup: how the folders are structured and what the input IR looks like).

I can cherry pick this and check by compiling something like llama/SDXL in IREE. I should be able to find time for this in the next couple of days.

@kuhar just a reminder on this, this isn't urgent but I wouldn't want to this get forgotten either. Thanks!

efric · 2025-09-26T21:40:29Z

@joker-eph

Cherry picked this and tested on SDXL in IREE.

Some stats:

 124054 mlir-builder-try-to-fold - Number of OpBuilder::tryFold loop total calls 3 mlir-builder-try-to-fold - Number of OpBuilder::tryFold loop max iterations 124247 mlir-builder-try-to-fold - Number of OpBuilder::tryFold loop total iterations 86 mlir-builder-try-to-fold - OpBuilder::tryFold total loop time (us)

So average number of iterations is roughly 1.

No calls to the one in FoldUtils.cpp.

No meaningful increase in total compilation time 😄.

…lvm#160615)" This reverts commit fcf79e5.

Still carrying reverts - llvm/llvm-project#159083 - llvm/llvm-project#160615 Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

…lvm#160615)" This reverts commit fcf79e5.

Still carrying these reverts: - llvm/llvm-project#160615. See #22171 Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

Still carrying these reverts: - llvm/llvm-project#160615. See #22171 - Fixes namespace usage in TD files. Not really sure this is the best way, but this works - Also update similar namespace usage in .td files in StableHLO. For now carrying a local patch, but will follow up with bump of StableHLO if that is fixed upstream. - Update `.Cases` to use `initializer_list` which was caught as an error on bazel builds. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

…lvm#160615)" This reverts commit fcf79e5.

Still carrying reverts - llvm/llvm-project#159083 - llvm/llvm-project#160615 Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com> Signed-off-by: Philipp <philipp.weidel@intel.com>

Still carrying these reverts: - llvm/llvm-project#160615. See iree-org#22171 - Fixes namespace usage in TD files. Not really sure this is the best way, but this works - Also update similar namespace usage in .td files in StableHLO. For now carrying a local patch, but will follow up with bump of StableHLO if that is fixed upstream. - Update `.Cases` to use `initializer_list` which was caught as an error on bazel builds. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com> Signed-off-by: Philipp <philipp.weidel@intel.com>

…lvm#160615)" This reverts commit fcf79e5.

New Revert: - llvm/llvm-project@8c05b5c causes numerical errors see (#22354 (comment)) Still carrying these reverts: - llvm/llvm-project#160615 (See #22171) Fixes: - API change to linalg::rewriteAsPaddedOp (llvm/llvm-project@1508a8e) - API change to populateMathToROCDLConversionPatterns (llvm/llvm-project@fbbffc1) - Changes to lit tests --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Still carrying these reverts: - llvm/llvm-project#160615 (See #22171) - llvm/llvm-project#163440 (See #22354 (comment)) Fixes: - Remove deprecated references to `.CaseLowers` - Added dependencies required due to new added dependency to LLVMIR llvm/llvm-project#163881 --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

…lvm#160615)" This reverts commit fcf79e5.

joker-eph requested review from jpienaar and matthias-springer September 24, 2025 22:27

llvmbot added mlir:core MLIR Core Infrastructure mlir:gpu mlir labels Sep 24, 2025

joker-eph force-pushed the fold_iterative branch from a65f6d2 to d7d8fd0 Compare September 24, 2025 22:27

llvmbot added mlir:arith llvm:adt labels Sep 24, 2025

joker-eph force-pushed the fold_iterative branch from d7d8fd0 to 64d1ed6 Compare September 24, 2025 22:33

kuhar reviewed Sep 24, 2025

View reviewed changes

mlir/test/Dialect/Arith/constant-fold.mlir Show resolved Hide resolved

mlir/test/Dialect/Arith/constant-fold.mlir Outdated Show resolved Hide resolved

kuhar reviewed Sep 24, 2025

View reviewed changes

llvm/include/llvm/ADT/BitVector.h Show resolved Hide resolved

joker-eph force-pushed the fold_iterative branch from 64d1ed6 to 27d1d42 Compare September 24, 2025 22:38

joker-eph force-pushed the fold_iterative branch from 27d1d42 to bfd7aff Compare September 24, 2025 22:49

matthias-springer approved these changes Sep 25, 2025

View reviewed changes

Groverkss approved these changes Sep 25, 2025

View reviewed changes

CoTinker approved these changes Sep 26, 2025

View reviewed changes

joker-eph force-pushed the fold_iterative branch 2 times, most recently from 63ded52 to 4232f1c Compare September 26, 2025 17:27

joker-eph force-pushed the fold_iterative branch from 4232f1c to 470fec0 Compare September 26, 2025 17:28

newling pushed a commit to newling/llvm-project that referenced this pull request Oct 10, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

ff97bd0

…lvm#160615)" This reverts commit fcf79e5.

newling pushed a commit to iree-org/llvm-project that referenced this pull request Oct 10, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

4ca4a4f

…lvm#160615)" This reverts commit fcf79e5.

MaheshRavishankar pushed a commit to iree-org/llvm-project that referenced this pull request Oct 13, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

3878906

…lvm#160615)" This reverts commit fcf79e5.

MaheshRavishankar added a commit to iree-org/iree that referenced this pull request Oct 13, 2025

Bump LLVM to llvm/llvm-project@5a636c6

296787d

Still carrying reverts - llvm/llvm-project#159083 - llvm/llvm-project#160615 Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

MaheshRavishankar mentioned this pull request Oct 13, 2025

Bump LLVM to llvm/llvm-project@5a636c6 iree-org/iree#22290

Merged

MaheshRavishankar added a commit to iree-org/iree that referenced this pull request Oct 14, 2025

Bump LLVM to llvm/llvm-project@5a636c6 (#22290)

d49f6f5

Still carrying reverts - llvm/llvm-project#159083 - llvm/llvm-project#160615 Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

MaheshRavishankar pushed a commit to iree-org/llvm-project that referenced this pull request Oct 14, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

0566f49

…lvm#160615)" This reverts commit fcf79e5.

MaheshRavishankar pushed a commit to iree-org/llvm-project that referenced this pull request Oct 16, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

1d17bcd

…lvm#160615)" This reverts commit fcf79e5.

MaheshRavishankar added a commit to iree-org/iree that referenced this pull request Oct 16, 2025

[Integrate] Bump LLVM to llvm/llvm-project@893b1d4

6055e45

Still carrying these reverts: - llvm/llvm-project#160615. See #22171 Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

MaheshRavishankar mentioned this pull request Oct 16, 2025

[Integrate] Bump LLVM to llvm/llvm-project@893b1d4 iree-org/iree#22334

Merged

newling pushed a commit to iree-org/llvm-project that referenced this pull request Oct 18, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

447dd98

…lvm#160615)" This reverts commit fcf79e5.

newling mentioned this pull request Oct 18, 2025

[Integrate] Bump LLVM to llvm/llvm-project@1508a8e iree-org/iree#22347

Closed

Muzammiluddin-Syed-ECE mentioned this pull request Oct 20, 2025

Bump LLVM to llvm/llvm-project@c8cf393 iree-org/iree#22354

Merged

Muzammiluddin-Syed-ECE pushed a commit to iree-org/llvm-project that referenced this pull request Oct 20, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

0fc8f92

…lvm#160615)" This reverts commit fcf79e5.

Muzammiluddin-Syed-ECE pushed a commit to iree-org/llvm-project that referenced this pull request Oct 21, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

70fc571

…lvm#160615)" This reverts commit fcf79e5.

Muzammiluddin-Syed-ECE mentioned this pull request Oct 21, 2025

Bump LLVM to llvm/llvm-project@683e2bf iree-org/iree#22366

Merged

uenoku mentioned this pull request Oct 21, 2025

Bump LLVM llvm/circt#9121

Merged

Muzammiluddin-Syed-ECE pushed a commit to iree-org/llvm-project that referenced this pull request Oct 22, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

a5314ac

…lvm#160615)" This reverts commit fcf79e5.

Muzammiluddin-Syed-ECE mentioned this pull request Oct 22, 2025

Bump LLVM to llvm/llvm-project@9abbec6 iree-org/iree#22374

Open

Muzammiluddin-Syed-ECE pushed a commit to iree-org/llvm-project that referenced this pull request Oct 24, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

8d21978

…lvm#160615)" This reverts commit fcf79e5.

Muzammiluddin-Syed-ECE mentioned this pull request Oct 24, 2025

Bump LLVM to llvm/llvm-project@fe5f499 iree-org/iree#22404

Open

Yu-Zhewen pushed a commit to iree-org/llvm-project that referenced this pull request Oct 27, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

17faada

…lvm#160615)" This reverts commit fcf79e5.

Yu-Zhewen mentioned this pull request Oct 27, 2025

Bump LLVM to llvm/llvm-project@e903494 iree-org/iree#22427

Merged

Yu-Zhewen pushed a commit to iree-org/llvm-project that referenced this pull request Oct 28, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

33a104b

…lvm#160615)" This reverts commit fcf79e5.

Yu-Zhewen pushed a commit to iree-org/llvm-project that referenced this pull request Oct 28, 2025

Revert "[MLIR] Improve in-place folding to iterate until fixed-point (l…

2bd7a5f

…lvm#160615)" This reverts commit fcf79e5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MLIR] Improve in-place folding to iterate until fixed-point #160615

[MLIR] Improve in-place folding to iterate until fixed-point #160615

Uh oh!

joker-eph commented Sep 24, 2025 •

edited

Loading

llvmbot commented Sep 24, 2025 •

edited

Loading

llvmbot commented Sep 24, 2025

kuhar left a comment

Uh oh!

Uh oh!

Uh oh!

joker-eph commented Sep 24, 2025 •

edited

Loading

kuhar commented Sep 24, 2025

joker-eph commented Sep 24, 2025 •

edited

Loading

joker-eph commented Sep 24, 2025

matthias-springer left a comment

Groverkss left a comment

dcaballe commented Sep 26, 2025 •

edited

Loading

joker-eph commented Sep 26, 2025

joker-eph commented Sep 26, 2025

efric commented Sep 26, 2025

Labels

9 participants

Uh oh!

[MLIR] Improve in-place folding to iterate until fixed-point #160615

[MLIR] Improve in-place folding to iterate until fixed-point #160615

Uh oh!

Conversation

joker-eph commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llvmbot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llvmbot commented Sep 24, 2025

kuhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

joker-eph commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kuhar commented Sep 24, 2025

joker-eph commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

joker-eph commented Sep 24, 2025

matthias-springer left a comment

Choose a reason for hiding this comment

Groverkss left a comment

Choose a reason for hiding this comment

dcaballe commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

joker-eph commented Sep 26, 2025

joker-eph commented Sep 26, 2025

efric commented Sep 26, 2025

Labels

9 participants

joker-eph commented Sep 24, 2025 •

edited

Loading

llvmbot commented Sep 24, 2025 •

edited

Loading

joker-eph commented Sep 24, 2025 •

edited

Loading

joker-eph commented Sep 24, 2025 •

edited

Loading

dcaballe commented Sep 26, 2025 •

edited

Loading