[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix #162780

Jianhui-Li · 2025-10-10T05:15:27Z

This PR adds lowering of xegpu.load_matrix/store_matrix to xevm.blockload/blockstore or and llvm.load/store, depending on wi level attributes.
It includes a few components:

adds wi-level attributes: subgroup_block_io.
expand load_matrix/store_matrix op definition to support scalar data (besides vector data).
adds a member function to mem_desc to compute the linearized address for a nd offsets.
add lowering depending on wi-level attributes:
a) if subgroup_block_io attribute presents, lower to xevm.blockload/blockstore
c) else lower to llvm.load/store. If result is a vector, lower to llvm.load/store with vector operand.

llvmbot · 2025-10-10T05:16:01Z

@llvm/pr-subscribers-mlir-gpu

Author: Jianhui Li (Jianhui-Li)

Changes

This PR adds lowering of xegpu.load_matrix/store_matrix to xevm.blockload/blockstore or and llvm.load/store, depending on wi level attributes.
It includes a few components:

adds wi-level attributes: subgroup_block_io, vec_length, and vec_direction.
expand load_matrix/store_matrix op definition to support scalar data (besides vector data).
adds a member function to mem_desc to compute the linearized address for a nd offsets.
add lowering depending on wi-level attributes:
a) if result is scalar, lower to regular llvm.load/store
b) if result is a vector and subgroup_block_io attribute presents, lower to xevm.blockload/blockstore
c) if result is a vector and vec_lenght/vec_direction present, lower to llvm.load/store with vector operand.

Patch is 53.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162780.diff

13 Files Affected:

(modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td (+22)
(modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td (+18-8)
(modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td (+49-1)
(modified) mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt (+1)
(modified) mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp (+198)
(modified) mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp (+146)
(modified) mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp (+68-25)
(modified) mlir/lib/Dialect/XeGPU/Transforms/XeGPUUnroll.cpp (+2-2)
(modified) mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp (+1-1)
(modified) mlir/test/Conversion/XeGPUToXeVM/dpas.mlir (+2-6)
(added) mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir (+201)
(modified) mlir/test/Dialect/XeGPU/invalid.mlir (+30-1)
(modified) mlir/test/Dialect/XeGPU/ops.mlir (+49-8)

diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td index 5695d5d515d7f..601e966b49890 100644 --- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td +++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td @@ -716,8 +716,30 @@ def XeGPU_MemLayoutAttr : XeGPUAttr<"MemLayout", "mem_layout"> { return getAttrs().getAs<ArrayAttr>("stride"); } + ArrayAttr getBlockAttr() { + return getAttrs().getAs<ArrayAttr>("block"); + } + }]; } +def RowOriented : I32EnumAttrCase<"ROW", 0, "row">; +def ColOriented : I32EnumAttrCase<"COL", 1, "col">; +def MatrixAccessDirection :  + I32EnumAttr<"MatrixAccessDirection",  + "Matrix elements/vectors can have row or column direction", [ + RowOriented, ColOriented +]> { + let genSpecializedAttr = 0; + let cppNamespace = "::mlir::xegpu"; +} +def MatrixAccessDirectionAttr :  + EnumAttr<XeGPU_Dialect,  + MatrixAccessDirection,  + "matrix_access_direction">{ + let summary = [{Describe the direction of memory access for load_matrix and store_matrix.}]; + let assemblyFormat = "`<` $value `>`"; +} + #endif // MLIR_DIALECT_XEGPU_IR_XEGPUATTRS_TD diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td index 73f9061f5debe..044a8ef22d891 100644 --- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td +++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td @@ -1298,14 +1298,16 @@ def XeGPU_CreateMemDescOp: XeGPU_Op<"create_mem_desc", [Pure, } def XeGPU_LoadMatrixOp: XeGPU_Op<"load_matrix", [MemoryEffects<[MemRead]>, - AllElementTypesMatch<["mem_desc", "res"]>, - AllRanksMatch<["mem_desc", "res"]>]> { + AllElementTypesMatch<["mem_desc", "res"]>]> { let arguments = (ins XeGPU_MemDesc:$mem_desc, Variadic<Index>: $offsets, DenseI64ArrayAttr: $const_offsets, + OptionalAttr<I32Attr>:$vec_length, + OptionalAttr<MatrixAccessDirectionAttr>:$vec_direction, + OptionalAttr<UnitAttr>:$subgroup_block_io, OptionalAttr<DistributeLayoutAttr>:$layout ); - let results = (outs XeGPU_ValueType:$res); + let results = (outs AnyTypeOf<[XeGPU_ValueType, XeGPU_ScalarType]>:$res);  let assemblyFormat = [{ $mem_desc `` custom<DynamicIndexList>($offsets, $const_offsets) prop-dict attr-dict `` `:` type(operands) `->` type(results) @@ -1336,7 +1338,10 @@ def XeGPU_LoadMatrixOp: XeGPU_Op<"load_matrix", [MemoryEffects<[MemRead]>, } ArrayRef<int64_t> getDataShape() { - return getRes().getType().getShape(); + auto resTy = getRes().getType(); + if (auto vecTy = llvm::dyn_cast<VectorType>(resTy)) + return vecTy.getShape(); + return {}; } }]; @@ -1344,13 +1349,15 @@ def XeGPU_LoadMatrixOp: XeGPU_Op<"load_matrix", [MemoryEffects<[MemRead]>, } def XeGPU_StoreMatrixOp: XeGPU_Op<"store_matrix", [MemoryEffects<[MemWrite]>, - AllElementTypesMatch<["mem_desc", "data"]>, - AllRanksMatch<["mem_desc", "data"]>]> { + AllElementTypesMatch<["mem_desc", "data"]>]> { let arguments = (ins - XeGPU_ValueType:$data, + AnyTypeOf<[XeGPU_ValueType, XeGPU_ScalarType]>:$data, XeGPU_MemDesc:$mem_desc, Variadic<Index>: $offsets, DenseI64ArrayAttr: $const_offsets, + OptionalAttr<I32Attr>:$vec_length, + OptionalAttr<MatrixAccessDirectionAttr>:$vec_direction, + OptionalAttr<UnitAttr>:$subgroup_block_io, OptionalAttr<DistributeLayoutAttr>:$layout ); let assemblyFormat = [{ $data `,` $mem_desc `` custom<DynamicIndexList>($offsets, $const_offsets) @@ -1378,7 +1385,10 @@ def XeGPU_StoreMatrixOp: XeGPU_Op<"store_matrix", [MemoryEffects<[MemWrite]>, } ArrayRef<int64_t> getDataShape() { - return getData().getType().getShape(); + auto DataTy = getData().getType(); + if (auto vecTy = llvm::dyn_cast<VectorType>(DataTy)) + return vecTy.getShape(); + return {}; } }]; diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td index 84902b2039643..c261fbb576642 100644 --- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td +++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td @@ -237,7 +237,7 @@ def XeGPU_MemDesc: XeGPUTypeDef<"MemDesc", "mem_desc", [ShapedTypeInterface], "m return MemDescType::get(getContext(), shape.value_or(getShape()), elementType, getMemLayout()); } - ArrayAttr getStrides() { + ArrayAttr getStridesAttr() { auto layout = getMemLayout(); if (layout && layout.hasAttr("stride")) { return layout.getStrides(); @@ -250,6 +250,54 @@ def XeGPU_MemDesc: XeGPUTypeDef<"MemDesc", "mem_desc", [ShapedTypeInterface], "m Builder builder(getContext()); return builder.getI64ArrayAttr(defaultStrides); } + + /// Heuristic to determine if the MemDesc uses column-major layout, + /// based on the rank and the value of the first stride dimension. + bool isColMajor() { + auto dim0 = dyn_cast<IntegerAttr>(getStridesAttr()[0]); + return getRank() == 2 && dim0 && dim0.getInt() == 1; + } + + // get the Blocking shape for a MemDescType, Which is represented + // as an attribute in MemDescType. By default it is the shape + // of the mdescTy + SmallVector<int64_t> getBlockSize() { + SmallVector<int64_t> size(getShape()); + MemLayoutAttr layout = getMemLayout(); + if (layout && layout.hasAttr("block")) { + ArrayAttr attr = layout.getBlockAttr(); + size.clear(); + llvm::for_each(attr, [&](Attribute elem) { + if (auto intElem = dyn_cast<IntegerAttr>(elem)) + size.push_back(intElem.getInt()); + }); + } + return size; + } + + // Get strides as vector of integer.  + // If it contains block attribute, the strides are blocked strides. + // + // The blocking is applied against the original matrix shape + // so that the linear offset is not impacted by the subview. + // + // It first computes the original matrix shape using the stride info, + // then computes the number of blocks in each dimension of original shape, + // then compute the outer block shape and stride, + // then combines the inner and outer block shape and stride + // e.g. for mem_desc<32x256xf16, @block=[16, 8], @strides=[1, 32]> + // its memory layout tuple is ([2,32,16,8],[128,256,1,16]) + // for mem_desc<256x32xf16, @block=[8, 16]> with default @stride[32, 1] + // its memory layout tuple is ([32,2,8,16],[256,128,16,1]) + SmallVector<int64_t> getStrides(); +  + /// Generates instructions to compute the linearize offset  + // if the memory descriptor is blocked, it returns linearize offset based on the blocked layout + // the strides of memory descriptor is always considered regardless of blocked or not + Value getLinearOffsets(OpBuilder &builder, + Location loc, ArrayRef<OpFoldResult> offsets); + + }]; let hasCustomAssemblyFormat = true; diff --git a/mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt b/mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt index 84b25809f1ed0..dd9edc43a1657 100644 --- a/mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt +++ b/mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt @@ -21,6 +21,7 @@ add_mlir_conversion_library(MLIRXeGPUToXeVM MLIRIndexDialect MLIRSCFDialect MLIRXeGPUDialect + MLIRXeGPUUtils MLIRPass MLIRTransforms MLIRSCFTransforms diff --git a/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp b/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp index 9ead1d89069d6..67e8246e5536a 100644 --- a/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp +++ b/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp @@ -21,6 +21,7 @@ #include "mlir/Dialect/SCF/IR/SCF.h" #include "mlir/Dialect/SCF/Transforms/Patterns.h" #include "mlir/Dialect/XeGPU/IR/XeGPU.h" +#include "mlir/Dialect/XeGPU/Utils/XeGPUUtils.h" #include "mlir/Pass/Pass.h" #include "mlir/Support/LLVM.h" #include "llvm/Support/FormatVariadic.h" @@ -60,6 +61,9 @@ static int32_t getNumericXeVMAddrSpace(xegpu::MemorySpace xeGpuMemspace) { return static_cast<int>(xevm::AddrSpace::GLOBAL); case xegpu::MemorySpace::SLM: return static_cast<int>(xevm::AddrSpace::SHARED); + default: + llvm_unreachable("Unknown XeGPU memory space"); + return static_cast<int>(xevm::AddrSpace::GLOBAL); } } @@ -503,6 +507,189 @@ class LoadStoreToXeVMPattern : public OpConversionPattern<OpType> { } }; +// Lower xegpu::CreateMemDescOp to memref::ViewOp. Since SLM access instructions +// on Xe2 and Xe3 operate on 32-bit or 64-bit units, all data types smaller than +// 32 bits will be converted to 32 bits. +class CreateMemDescOpPattern final + : public OpConversionPattern<xegpu::CreateMemDescOp> { +public: + using OpConversionPattern<xegpu::CreateMemDescOp>::OpConversionPattern; + LogicalResult + matchAndRewrite(xegpu::CreateMemDescOp op, OpAdaptor adaptor, + ConversionPatternRewriter &rewriter) const override { + TypedValue<MemRefType> src = op.getSource(); + auto resTy = cast<xegpu::MemDescType>(op.getResult().getType()); + + // Create the result MemRefType with the same shape, element type, and + // memory space + auto newResTy = getTypeConverter()->convertType<MemRefType>(resTy); + + Value zero = arith::ConstantIndexOp::create(rewriter, op.getLoc(), 0); + auto viewOp = memref::ViewOp::create(rewriter, op.getLoc(), newResTy, + Value(src), zero, ValueRange()); + rewriter.replaceOp(op, viewOp); + return success(); + } +}; + +class MemDescSubviewOpPattern final + : public OpConversionPattern<xegpu::MemDescSubviewOp> { +public: + using OpConversionPattern<xegpu::MemDescSubviewOp>::OpConversionPattern; + LogicalResult + matchAndRewrite(xegpu::MemDescSubviewOp op, OpAdaptor adaptor, + ConversionPatternRewriter &rewriter) const override { + return rewriter.notifyMatchFailure( + op, "MemDescSubviewOp are not supported on Xe2/Xe3 architecture."); + } +}; + +template <typename OpType, + typename = std::enable_if_t<llvm::is_one_of< + OpType, xegpu::LoadMatrixOp, xegpu::StoreMatrixOp>::value>> +class LoadStoreMatrixToXeVMPattern : public OpConversionPattern<OpType> { + using OpConversionPattern<OpType>::OpConversionPattern; + LogicalResult + matchAndRewrite(OpType op, typename OpType::Adaptor adaptor, + ConversionPatternRewriter &rewriter) const override { + + SmallVector<OpFoldResult> offsets = op.getMixedOffsets(); + if (offsets.empty()) + return rewriter.notifyMatchFailure(op, "Expected offset to be provided."); + + auto loc = op.getLoc(); + auto ctxt = rewriter.getContext(); + Value basePtrStruct = adaptor.getMemDesc(); + Value mdescVal = op.getMemDesc(); + // Load result or Store value Type can be vector or scalar. + Value data; + if constexpr (std::is_same_v<OpType, xegpu::LoadMatrixOp>) + data = op.getResult(); + else + data = adaptor.getData(); + VectorType valOrResVecTy = dyn_cast<VectorType>(data.getType()); + if (!valOrResVecTy) + valOrResVecTy = VectorType::get(1, data.getType()); + + int64_t elemBitWidth = + valOrResVecTy.getElementType().getIntOrFloatBitWidth(); + // Element type must be multiple of 8 bits. + if (elemBitWidth % 8 != 0) + return rewriter.notifyMatchFailure( + op, "Expected element type bit width to be multiple of 8."); + int64_t elemByteSize = elemBitWidth / 8; + + // Default memory space is SLM. + LLVM::LLVMPointerType ptrTypeLLVM = LLVM::LLVMPointerType::get( + ctxt, getNumericXeVMAddrSpace(xegpu::MemorySpace::SLM)); + + auto mdescTy = cast<xegpu::MemDescType>(mdescVal.getType()); + + Value basePtrLLVM = memref::ExtractAlignedPointerAsIndexOp::create( + rewriter, loc, basePtrStruct); + + // Convert base pointer (ptr) to i64 + Value basePtrI64 = arith::IndexCastUIOp::create( + rewriter, loc, rewriter.getI64Type(), basePtrLLVM); + + Value linearOffset = mdescTy.getLinearOffsets(rewriter, loc, offsets); + linearOffset = arith::IndexCastUIOp::create( + rewriter, loc, rewriter.getI64Type(), linearOffset); + basePtrI64 = + addOffset(rewriter, loc, basePtrI64, linearOffset, elemByteSize); + + // convert base pointer (i64) to LLVM pointer type + basePtrLLVM = + LLVM::IntToPtrOp::create(rewriter, loc, ptrTypeLLVM, basePtrI64); + + // if the size of valOrResVecTy is 1, it lowers to a scalar load/store + // operation. LLVM load/store does not support vector of size 1, so we need + // to handle this case separately. + if (valOrResVecTy.getNumElements() == 1) { + Type scalarTy = valOrResVecTy.getElementType(); + if constexpr (std::is_same_v<OpType, xegpu::LoadMatrixOp>) { + Value loadOp = + LLVM::LoadOp::create(rewriter, loc, scalarTy, basePtrLLVM); + rewriter.replaceOp(op, loadOp); + } else { + auto storeOp = LLVM::StoreOp::create(rewriter, loc, adaptor.getData(), + basePtrLLVM); + rewriter.eraseOp(op); + } + return success(); + } else { + // if the attribute 'subgroup_block_io' is set to true, it lowers to + // xevm.blockload + auto subgroupBlockIoAttr = op.getSubgroupBlockIoAttr(); + bool subgroup_block_io = static_cast<bool>(subgroupBlockIoAttr); + + // BlockLoadOp only supports integer types, so we need to bitcast + // Get integer type with matching bit width + Type elemTy = valOrResVecTy.getElementType(); + int64_t bitWidth = elemTy.getIntOrFloatBitWidth(); + Type intElemTy = rewriter.getIntegerType(bitWidth); + VectorType intVecTy = + VectorType::get(valOrResVecTy.getShape(), intElemTy); + + if (subgroup_block_io) { + if constexpr (std::is_same_v<OpType, xegpu::LoadMatrixOp>) { + Value loadOp = + xevm::BlockLoadOp::create(rewriter, loc, intVecTy, basePtrLLVM); + if (intVecTy != valOrResVecTy) { + loadOp = + vector::BitCastOp::create(rewriter, loc, valOrResVecTy, loadOp); + } + rewriter.replaceOp(op, loadOp); + } else { + Value dataToStore = adaptor.getData(); + if (valOrResVecTy != intVecTy) { + dataToStore = + vector::BitCastOp::create(rewriter, loc, intVecTy, dataToStore); + } + xevm::BlockStoreOp::create(rewriter, loc, basePtrLLVM, dataToStore, + nullptr); + rewriter.eraseOp(op); + } + } else { + // if the result is 1D vector, if the vector direction is Column, then + // the + // memory descriptor should be treated as column major + auto chipOpt = xegpu::getChipStr(op); + if (!chipOpt || (*chipOpt != "pvc" && *chipOpt != "bmg")) { + // the lowering only works for pvc and bmg + return rewriter.notifyMatchFailure( + op, "The lowering is specific to pvc or bmg."); + } + xegpu::MatrixAccessDirectionAttr vecDirection = + op.getVecDirectionAttr(); + if (vecDirection && + vecDirection.getValue() == xegpu::MatrixAccessDirection::COL && + !mdescTy.isColMajor()) + return rewriter.notifyMatchFailure( + op, "mem_desc should be column major when " + "vec_direction is COLUMN for 1D result."); + if (vecDirection && + vecDirection.getValue() == xegpu::MatrixAccessDirection::ROW && + mdescTy.isColMajor()) + return rewriter.notifyMatchFailure( + op, "mem_desc should be row major when " + "vec_direction is ROW for 1D result."); + + if constexpr (std::is_same_v<OpType, xegpu::LoadMatrixOp>) { + Value loadOp = + LLVM::LoadOp::create(rewriter, loc, valOrResVecTy, basePtrLLVM); + rewriter.replaceOp(op, loadOp); + } else { + auto storeOp = LLVM::StoreOp::create(rewriter, loc, adaptor.getData(), + basePtrLLVM); + rewriter.eraseOp(op); + } + } + } + return success(); + } +}; + class PrefetchToXeVMPattern : public OpConversionPattern<xegpu::PrefetchOp> { using OpConversionPattern::OpConversionPattern; LogicalResult @@ -785,6 +972,13 @@ struct ConvertXeGPUToXeVMPass auto i32Type = IntegerType::get(&getContext(), 32); return VectorType::get(8, i32Type); }); + // Convert MemDescType into flattened MemRefType for SLM + typeConverter.addConversion([&](xegpu::MemDescType type) -> Type { + Type elemTy = type.getElementType(); + int numElems = type.getNumElements(); + return MemRefType::get(numElems, elemTy, AffineMap(), 3); + }); + typeConverter.addConversion([&](MemRefType type) -> Type { // Convert MemRefType to i64 type. return IntegerType::get(&getContext(), 64); @@ -919,6 +1113,10 @@ void mlir::populateXeGPUToXeVMConversionPatterns( LoadStoreToXeVMPattern<xegpu::LoadGatherOp>, LoadStoreToXeVMPattern<xegpu::StoreScatterOp>>( typeConverter, patterns.getContext()); + patterns.add<LoadStoreMatrixToXeVMPattern<xegpu::LoadMatrixOp>, + LoadStoreMatrixToXeVMPattern<xegpu::StoreMatrixOp>, + CreateMemDescOpPattern, MemDescSubviewOpPattern>( + typeConverter, patterns.getContext()); patterns.add<FenceToXeVMPattern, DpasToXeVMPattern>(typeConverter, patterns.getContext()); } diff --git a/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp b/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp index 94c5509fd7c29..cccc8fab4adbc 100644 --- a/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp +++ b/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp @@ -726,6 +726,152 @@ void MemLayoutAttr::print(AsmPrinter &printer) const { } printer << ">"; } +// a helper utility to perform binary operation on OpFoldResult. +// If both a and b are attributes, it will simply return the result. +// Otherwise, the corresponding arith op will be generated, and an +// contant op will be created if one of them is an attribute. +template <typename ArithOp> +OpFoldResult genBinOp(OpFoldResult a, OpFoldResult b, Location loc, + OpBuilder &builder) { + auto aVal = getValueOrCreateConstantIndexOp(builder, loc, a); + auto bVal = getValueOrCreateConstantIndexOp(builder, loc, b); + return builder.create<ArithOp>(loc, aVal, bVal).getResult(); +} + +// a helper utility to perform division operation on OpFoldResult and int64_t. +#define div(a, b) \ + genBinOp<arith::DivSIOp>(a, builder.getIndexAttr(b), loc, builder) + +// a helper utility to perform reminder operation on OpFoldResult and int64_t. +#define rem(a, b) \ + genBinOp<arith::RemSIOp>(a, builder.getIndexAttr(b), loc, builder) + +// a helper utility to perform multiply operation on OpFoldResult and int64_t. +#define mul(a, b) \ + genBinOp<arith::MulIOp>(a, builder.getIndexAttr(b), loc, builder) + +// a helper utility to perform addition operation on two OpFoldResult. +#define add(a, b) genBinOp<arith::AddIOp>(a, b, loc, builder) + +// block the given offsets according to the block shape +// say the original offset is [y, x], and the block shape is [By, Bx], +// then the blocked offset is [y/By, x/Bx, y%By, x%Bx] +SmallVector<OpFoldResult> getBlockedOffsets(OpBuilder &builder, Location loc, + ArrayRef<OpFoldResult> offsets, + ArrayRef<int64_t> blockShape) { + + assert(offsets.size() == blockShape.size() && + "offsets and blockShape must have the same size"); + SmallVector<OpFoldResult> blockedOffsets; + SmallVector<OpFoldResult> divs, rems; + + for (auto [offset, block] : llvm::zip(offsets, blockShape)) { + divs.push_back(div(offset, block)); + rems.push_back(... [truncated]

llvmbot · 2025-10-10T05:16:02Z

@llvm/pr-subscribers-mlir

Author: Jianhui Li (Jianhui-Li)

Changes

This PR adds lowering of xegpu.load_matrix/store_matrix to xevm.blockload/blockstore or and llvm.load/store, depending on wi level attributes.
It includes a few components:

adds wi-level attributes: subgroup_block_io, vec_length, and vec_direction.
expand load_matrix/store_matrix op definition to support scalar data (besides vector data).
adds a member function to mem_desc to compute the linearized address for a nd offsets.
add lowering depending on wi-level attributes:
a) if result is scalar, lower to regular llvm.load/store
b) if result is a vector and subgroup_block_io attribute presents, lower to xevm.blockload/blockstore
c) if result is a vector and vec_lenght/vec_direction present, lower to llvm.load/store with vector operand.

Patch is 53.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162780.diff

13 Files Affected:

(modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td (+22)
(modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td (+18-8)
(modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td (+49-1)
(modified) mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt (+1)
(modified) mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp (+198)
(modified) mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp (+146)
(modified) mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp (+68-25)
(modified) mlir/lib/Dialect/XeGPU/Transforms/XeGPUUnroll.cpp (+2-2)
(modified) mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp (+1-1)
(modified) mlir/test/Conversion/XeGPUToXeVM/dpas.mlir (+2-6)
(added) mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir (+201)
(modified) mlir/test/Dialect/XeGPU/invalid.mlir (+30-1)
(modified) mlir/test/Dialect/XeGPU/ops.mlir (+49-8)

diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td index 5695d5d515d7f..601e966b49890 100644 --- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td +++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td @@ -716,8 +716,30 @@ def XeGPU_MemLayoutAttr : XeGPUAttr<"MemLayout", "mem_layout"> { return getAttrs().getAs<ArrayAttr>("stride"); } + ArrayAttr getBlockAttr() { + return getAttrs().getAs<ArrayAttr>("block"); + } + }]; } +def RowOriented : I32EnumAttrCase<"ROW", 0, "row">; +def ColOriented : I32EnumAttrCase<"COL", 1, "col">; +def MatrixAccessDirection :  + I32EnumAttr<"MatrixAccessDirection",  + "Matrix elements/vectors can have row or column direction", [ + RowOriented, ColOriented +]> { + let genSpecializedAttr = 0; + let cppNamespace = "::mlir::xegpu"; +} +def MatrixAccessDirectionAttr :  + EnumAttr<XeGPU_Dialect,  + MatrixAccessDirection,  + "matrix_access_direction">{ + let summary = [{Describe the direction of memory access for load_matrix and store_matrix.}]; + let assemblyFormat = "`<` $value `>`"; +} + #endif // MLIR_DIALECT_XEGPU_IR_XEGPUATTRS_TD diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td index 73f9061f5debe..044a8ef22d891 100644 --- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td +++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td @@ -1298,14 +1298,16 @@ def XeGPU_CreateMemDescOp: XeGPU_Op<"create_mem_desc", [Pure, } def XeGPU_LoadMatrixOp: XeGPU_Op<"load_matrix", [MemoryEffects<[MemRead]>, - AllElementTypesMatch<["mem_desc", "res"]>, - AllRanksMatch<["mem_desc", "res"]>]> { + AllElementTypesMatch<["mem_desc", "res"]>]> { let arguments = (ins XeGPU_MemDesc:$mem_desc, Variadic<Index>: $offsets, DenseI64ArrayAttr: $const_offsets, + OptionalAttr<I32Attr>:$vec_length, + OptionalAttr<MatrixAccessDirectionAttr>:$vec_direction, + OptionalAttr<UnitAttr>:$subgroup_block_io, OptionalAttr<DistributeLayoutAttr>:$layout ); - let results = (outs XeGPU_ValueType:$res); + let results = (outs AnyTypeOf<[XeGPU_ValueType, XeGPU_ScalarType]>:$res);  let assemblyFormat = [{ $mem_desc `` custom<DynamicIndexList>($offsets, $const_offsets) prop-dict attr-dict `` `:` type(operands) `->` type(results) @@ -1336,7 +1338,10 @@ def XeGPU_LoadMatrixOp: XeGPU_Op<"load_matrix", [MemoryEffects<[MemRead]>, } ArrayRef<int64_t> getDataShape() { - return getRes().getType().getShape(); + auto resTy = getRes().getType(); + if (auto vecTy = llvm::dyn_cast<VectorType>(resTy)) + return vecTy.getShape(); + return {}; } }]; @@ -1344,13 +1349,15 @@ def XeGPU_LoadMatrixOp: XeGPU_Op<"load_matrix", [MemoryEffects<[MemRead]>, } def XeGPU_StoreMatrixOp: XeGPU_Op<"store_matrix", [MemoryEffects<[MemWrite]>, - AllElementTypesMatch<["mem_desc", "data"]>, - AllRanksMatch<["mem_desc", "data"]>]> { + AllElementTypesMatch<["mem_desc", "data"]>]> { let arguments = (ins - XeGPU_ValueType:$data, + AnyTypeOf<[XeGPU_ValueType, XeGPU_ScalarType]>:$data, XeGPU_MemDesc:$mem_desc, Variadic<Index>: $offsets, DenseI64ArrayAttr: $const_offsets, + OptionalAttr<I32Attr>:$vec_length, + OptionalAttr<MatrixAccessDirectionAttr>:$vec_direction, + OptionalAttr<UnitAttr>:$subgroup_block_io, OptionalAttr<DistributeLayoutAttr>:$layout ); let assemblyFormat = [{ $data `,` $mem_desc `` custom<DynamicIndexList>($offsets, $const_offsets) @@ -1378,7 +1385,10 @@ def XeGPU_StoreMatrixOp: XeGPU_Op<"store_matrix", [MemoryEffects<[MemWrite]>, } ArrayRef<int64_t> getDataShape() { - return getData().getType().getShape(); + auto DataTy = getData().getType(); + if (auto vecTy = llvm::dyn_cast<VectorType>(DataTy)) + return vecTy.getShape(); + return {}; } }]; diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td index 84902b2039643..c261fbb576642 100644 --- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td +++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td @@ -237,7 +237,7 @@ def XeGPU_MemDesc: XeGPUTypeDef<"MemDesc", "mem_desc", [ShapedTypeInterface], "m return MemDescType::get(getContext(), shape.value_or(getShape()), elementType, getMemLayout()); } - ArrayAttr getStrides() { + ArrayAttr getStridesAttr() { auto layout = getMemLayout(); if (layout && layout.hasAttr("stride")) { return layout.getStrides(); @@ -250,6 +250,54 @@ def XeGPU_MemDesc: XeGPUTypeDef<"MemDesc", "mem_desc", [ShapedTypeInterface], "m Builder builder(getContext()); return builder.getI64ArrayAttr(defaultStrides); } + + /// Heuristic to determine if the MemDesc uses column-major layout, + /// based on the rank and the value of the first stride dimension. + bool isColMajor() { + auto dim0 = dyn_cast<IntegerAttr>(getStridesAttr()[0]); + return getRank() == 2 && dim0 && dim0.getInt() == 1; + } + + // get the Blocking shape for a MemDescType, Which is represented + // as an attribute in MemDescType. By default it is the shape + // of the mdescTy + SmallVector<int64_t> getBlockSize() { + SmallVector<int64_t> size(getShape()); + MemLayoutAttr layout = getMemLayout(); + if (layout && layout.hasAttr("block")) { + ArrayAttr attr = layout.getBlockAttr(); + size.clear(); + llvm::for_each(attr, [&](Attribute elem) { + if (auto intElem = dyn_cast<IntegerAttr>(elem)) + size.push_back(intElem.getInt()); + }); + } + return size; + } + + // Get strides as vector of integer.  + // If it contains block attribute, the strides are blocked strides. + // + // The blocking is applied against the original matrix shape + // so that the linear offset is not impacted by the subview. + // + // It first computes the original matrix shape using the stride info, + // then computes the number of blocks in each dimension of original shape, + // then compute the outer block shape and stride, + // then combines the inner and outer block shape and stride + // e.g. for mem_desc<32x256xf16, @block=[16, 8], @strides=[1, 32]> + // its memory layout tuple is ([2,32,16,8],[128,256,1,16]) + // for mem_desc<256x32xf16, @block=[8, 16]> with default @stride[32, 1] + // its memory layout tuple is ([32,2,8,16],[256,128,16,1]) + SmallVector<int64_t> getStrides(); +  + /// Generates instructions to compute the linearize offset  + // if the memory descriptor is blocked, it returns linearize offset based on the blocked layout + // the strides of memory descriptor is always considered regardless of blocked or not + Value getLinearOffsets(OpBuilder &builder, + Location loc, ArrayRef<OpFoldResult> offsets); + + }]; let hasCustomAssemblyFormat = true; diff --git a/mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt b/mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt index 84b25809f1ed0..dd9edc43a1657 100644 --- a/mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt +++ b/mlir/lib/Conversion/XeGPUToXeVM/CMakeLists.txt @@ -21,6 +21,7 @@ add_mlir_conversion_library(MLIRXeGPUToXeVM MLIRIndexDialect MLIRSCFDialect MLIRXeGPUDialect + MLIRXeGPUUtils MLIRPass MLIRTransforms MLIRSCFTransforms diff --git a/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp b/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp index 9ead1d89069d6..67e8246e5536a 100644 --- a/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp +++ b/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp @@ -21,6 +21,7 @@ #include "mlir/Dialect/SCF/IR/SCF.h" #include "mlir/Dialect/SCF/Transforms/Patterns.h" #include "mlir/Dialect/XeGPU/IR/XeGPU.h" +#include "mlir/Dialect/XeGPU/Utils/XeGPUUtils.h" #include "mlir/Pass/Pass.h" #include "mlir/Support/LLVM.h" #include "llvm/Support/FormatVariadic.h" @@ -60,6 +61,9 @@ static int32_t getNumericXeVMAddrSpace(xegpu::MemorySpace xeGpuMemspace) { return static_cast<int>(xevm::AddrSpace::GLOBAL); case xegpu::MemorySpace::SLM: return static_cast<int>(xevm::AddrSpace::SHARED); + default: + llvm_unreachable("Unknown XeGPU memory space"); + return static_cast<int>(xevm::AddrSpace::GLOBAL); } } @@ -503,6 +507,189 @@ class LoadStoreToXeVMPattern : public OpConversionPattern<OpType> { } }; +// Lower xegpu::CreateMemDescOp to memref::ViewOp. Since SLM access instructions +// on Xe2 and Xe3 operate on 32-bit or 64-bit units, all data types smaller than +// 32 bits will be converted to 32 bits. +class CreateMemDescOpPattern final + : public OpConversionPattern<xegpu::CreateMemDescOp> { +public: + using OpConversionPattern<xegpu::CreateMemDescOp>::OpConversionPattern; + LogicalResult + matchAndRewrite(xegpu::CreateMemDescOp op, OpAdaptor adaptor, + ConversionPatternRewriter &rewriter) const override { + TypedValue<MemRefType> src = op.getSource(); + auto resTy = cast<xegpu::MemDescType>(op.getResult().getType()); + + // Create the result MemRefType with the same shape, element type, and + // memory space + auto newResTy = getTypeConverter()->convertType<MemRefType>(resTy); + + Value zero = arith::ConstantIndexOp::create(rewriter, op.getLoc(), 0); + auto viewOp = memref::ViewOp::create(rewriter, op.getLoc(), newResTy, + Value(src), zero, ValueRange()); + rewriter.replaceOp(op, viewOp); + return success(); + } +}; + +class MemDescSubviewOpPattern final + : public OpConversionPattern<xegpu::MemDescSubviewOp> { +public: + using OpConversionPattern<xegpu::MemDescSubviewOp>::OpConversionPattern; + LogicalResult + matchAndRewrite(xegpu::MemDescSubviewOp op, OpAdaptor adaptor, + ConversionPatternRewriter &rewriter) const override { + return rewriter.notifyMatchFailure( + op, "MemDescSubviewOp are not supported on Xe2/Xe3 architecture."); + } +}; + +template <typename OpType, + typename = std::enable_if_t<llvm::is_one_of< + OpType, xegpu::LoadMatrixOp, xegpu::StoreMatrixOp>::value>> +class LoadStoreMatrixToXeVMPattern : public OpConversionPattern<OpType> { + using OpConversionPattern<OpType>::OpConversionPattern; + LogicalResult + matchAndRewrite(OpType op, typename OpType::Adaptor adaptor, + ConversionPatternRewriter &rewriter) const override { + + SmallVector<OpFoldResult> offsets = op.getMixedOffsets(); + if (offsets.empty()) + return rewriter.notifyMatchFailure(op, "Expected offset to be provided."); + + auto loc = op.getLoc(); + auto ctxt = rewriter.getContext(); + Value basePtrStruct = adaptor.getMemDesc(); + Value mdescVal = op.getMemDesc(); + // Load result or Store value Type can be vector or scalar. + Value data; + if constexpr (std::is_same_v<OpType, xegpu::LoadMatrixOp>) + data = op.getResult(); + else + data = adaptor.getData(); + VectorType valOrResVecTy = dyn_cast<VectorType>(data.getType()); + if (!valOrResVecTy) + valOrResVecTy = VectorType::get(1, data.getType()); + + int64_t elemBitWidth = + valOrResVecTy.getElementType().getIntOrFloatBitWidth(); + // Element type must be multiple of 8 bits. + if (elemBitWidth % 8 != 0) + return rewriter.notifyMatchFailure( + op, "Expected element type bit width to be multiple of 8."); + int64_t elemByteSize = elemBitWidth / 8; + + // Default memory space is SLM. + LLVM::LLVMPointerType ptrTypeLLVM = LLVM::LLVMPointerType::get( + ctxt, getNumericXeVMAddrSpace(xegpu::MemorySpace::SLM)); + + auto mdescTy = cast<xegpu::MemDescType>(mdescVal.getType()); + + Value basePtrLLVM = memref::ExtractAlignedPointerAsIndexOp::create( + rewriter, loc, basePtrStruct); + + // Convert base pointer (ptr) to i64 + Value basePtrI64 = arith::IndexCastUIOp::create( + rewriter, loc, rewriter.getI64Type(), basePtrLLVM); + + Value linearOffset = mdescTy.getLinearOffsets(rewriter, loc, offsets); + linearOffset = arith::IndexCastUIOp::create( + rewriter, loc, rewriter.getI64Type(), linearOffset); + basePtrI64 = + addOffset(rewriter, loc, basePtrI64, linearOffset, elemByteSize); + + // convert base pointer (i64) to LLVM pointer type + basePtrLLVM = + LLVM::IntToPtrOp::create(rewriter, loc, ptrTypeLLVM, basePtrI64); + + // if the size of valOrResVecTy is 1, it lowers to a scalar load/store + // operation. LLVM load/store does not support vector of size 1, so we need + // to handle this case separately. + if (valOrResVecTy.getNumElements() == 1) { + Type scalarTy = valOrResVecTy.getElementType(); + if constexpr (std::is_same_v<OpType, xegpu::LoadMatrixOp>) { + Value loadOp = + LLVM::LoadOp::create(rewriter, loc, scalarTy, basePtrLLVM); + rewriter.replaceOp(op, loadOp); + } else { + auto storeOp = LLVM::StoreOp::create(rewriter, loc, adaptor.getData(), + basePtrLLVM); + rewriter.eraseOp(op); + } + return success(); + } else { + // if the attribute 'subgroup_block_io' is set to true, it lowers to + // xevm.blockload + auto subgroupBlockIoAttr = op.getSubgroupBlockIoAttr(); + bool subgroup_block_io = static_cast<bool>(subgroupBlockIoAttr); + + // BlockLoadOp only supports integer types, so we need to bitcast + // Get integer type with matching bit width + Type elemTy = valOrResVecTy.getElementType(); + int64_t bitWidth = elemTy.getIntOrFloatBitWidth(); + Type intElemTy = rewriter.getIntegerType(bitWidth); + VectorType intVecTy = + VectorType::get(valOrResVecTy.getShape(), intElemTy); + + if (subgroup_block_io) { + if constexpr (std::is_same_v<OpType, xegpu::LoadMatrixOp>) { + Value loadOp = + xevm::BlockLoadOp::create(rewriter, loc, intVecTy, basePtrLLVM); + if (intVecTy != valOrResVecTy) { + loadOp = + vector::BitCastOp::create(rewriter, loc, valOrResVecTy, loadOp); + } + rewriter.replaceOp(op, loadOp); + } else { + Value dataToStore = adaptor.getData(); + if (valOrResVecTy != intVecTy) { + dataToStore = + vector::BitCastOp::create(rewriter, loc, intVecTy, dataToStore); + } + xevm::BlockStoreOp::create(rewriter, loc, basePtrLLVM, dataToStore, + nullptr); + rewriter.eraseOp(op); + } + } else { + // if the result is 1D vector, if the vector direction is Column, then + // the + // memory descriptor should be treated as column major + auto chipOpt = xegpu::getChipStr(op); + if (!chipOpt || (*chipOpt != "pvc" && *chipOpt != "bmg")) { + // the lowering only works for pvc and bmg + return rewriter.notifyMatchFailure( + op, "The lowering is specific to pvc or bmg."); + } + xegpu::MatrixAccessDirectionAttr vecDirection = + op.getVecDirectionAttr(); + if (vecDirection && + vecDirection.getValue() == xegpu::MatrixAccessDirection::COL && + !mdescTy.isColMajor()) + return rewriter.notifyMatchFailure( + op, "mem_desc should be column major when " + "vec_direction is COLUMN for 1D result."); + if (vecDirection && + vecDirection.getValue() == xegpu::MatrixAccessDirection::ROW && + mdescTy.isColMajor()) + return rewriter.notifyMatchFailure( + op, "mem_desc should be row major when " + "vec_direction is ROW for 1D result."); + + if constexpr (std::is_same_v<OpType, xegpu::LoadMatrixOp>) { + Value loadOp = + LLVM::LoadOp::create(rewriter, loc, valOrResVecTy, basePtrLLVM); + rewriter.replaceOp(op, loadOp); + } else { + auto storeOp = LLVM::StoreOp::create(rewriter, loc, adaptor.getData(), + basePtrLLVM); + rewriter.eraseOp(op); + } + } + } + return success(); + } +}; + class PrefetchToXeVMPattern : public OpConversionPattern<xegpu::PrefetchOp> { using OpConversionPattern::OpConversionPattern; LogicalResult @@ -785,6 +972,13 @@ struct ConvertXeGPUToXeVMPass auto i32Type = IntegerType::get(&getContext(), 32); return VectorType::get(8, i32Type); }); + // Convert MemDescType into flattened MemRefType for SLM + typeConverter.addConversion([&](xegpu::MemDescType type) -> Type { + Type elemTy = type.getElementType(); + int numElems = type.getNumElements(); + return MemRefType::get(numElems, elemTy, AffineMap(), 3); + }); + typeConverter.addConversion([&](MemRefType type) -> Type { // Convert MemRefType to i64 type. return IntegerType::get(&getContext(), 64); @@ -919,6 +1113,10 @@ void mlir::populateXeGPUToXeVMConversionPatterns( LoadStoreToXeVMPattern<xegpu::LoadGatherOp>, LoadStoreToXeVMPattern<xegpu::StoreScatterOp>>( typeConverter, patterns.getContext()); + patterns.add<LoadStoreMatrixToXeVMPattern<xegpu::LoadMatrixOp>, + LoadStoreMatrixToXeVMPattern<xegpu::StoreMatrixOp>, + CreateMemDescOpPattern, MemDescSubviewOpPattern>( + typeConverter, patterns.getContext()); patterns.add<FenceToXeVMPattern, DpasToXeVMPattern>(typeConverter, patterns.getContext()); } diff --git a/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp b/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp index 94c5509fd7c29..cccc8fab4adbc 100644 --- a/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp +++ b/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp @@ -726,6 +726,152 @@ void MemLayoutAttr::print(AsmPrinter &printer) const { } printer << ">"; } +// a helper utility to perform binary operation on OpFoldResult. +// If both a and b are attributes, it will simply return the result. +// Otherwise, the corresponding arith op will be generated, and an +// contant op will be created if one of them is an attribute. +template <typename ArithOp> +OpFoldResult genBinOp(OpFoldResult a, OpFoldResult b, Location loc, + OpBuilder &builder) { + auto aVal = getValueOrCreateConstantIndexOp(builder, loc, a); + auto bVal = getValueOrCreateConstantIndexOp(builder, loc, b); + return builder.create<ArithOp>(loc, aVal, bVal).getResult(); +} + +// a helper utility to perform division operation on OpFoldResult and int64_t. +#define div(a, b) \ + genBinOp<arith::DivSIOp>(a, builder.getIndexAttr(b), loc, builder) + +// a helper utility to perform reminder operation on OpFoldResult and int64_t. +#define rem(a, b) \ + genBinOp<arith::RemSIOp>(a, builder.getIndexAttr(b), loc, builder) + +// a helper utility to perform multiply operation on OpFoldResult and int64_t. +#define mul(a, b) \ + genBinOp<arith::MulIOp>(a, builder.getIndexAttr(b), loc, builder) + +// a helper utility to perform addition operation on two OpFoldResult. +#define add(a, b) genBinOp<arith::AddIOp>(a, b, loc, builder) + +// block the given offsets according to the block shape +// say the original offset is [y, x], and the block shape is [By, Bx], +// then the blocked offset is [y/By, x/Bx, y%By, x%Bx] +SmallVector<OpFoldResult> getBlockedOffsets(OpBuilder &builder, Location loc, + ArrayRef<OpFoldResult> offsets, + ArrayRef<int64_t> blockShape) { + + assert(offsets.size() == blockShape.size() && + "offsets and blockShape must have the same size"); + SmallVector<OpFoldResult> blockedOffsets; + SmallVector<OpFoldResult> divs, rems; + + for (auto [offset, block] : llvm::zip(offsets, blockShape)) { + divs.push_back(div(offset, block)); + rems.push_back(... [truncated]

adam-smnk

General design questions + minor side-notes

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td

mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp

mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir

mlir/lib/Dialect/XeGPU/Transforms/XeGPUUnroll.cpp

mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp

mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td

mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp

akroviakov · 2025-10-13T14:12:27Z

mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp

+ matchAndRewrite(xegpu::MemDescSubviewOp op, OpAdaptor adaptor,
+ ConversionPatternRewriter &rewriter) const override {
+ return rewriter.notifyMatchFailure(
+ op, "MemDescSubviewOp are not supported on Xe2/Xe3 architecture.");


What exactly prevents it, and why should the pattern exist then? Such limitations should be clarified in the op description to not surprise users only after the xegpu code is ready for lowering.

To me, it's still an open question how we'll help users across different generations.
A separate arch specific verifier pass might be needed.

For now, I'd be leaning toward leaving this as is just to slightly improve discoverability when ppl start wondering why this is missing or not working.

mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp

mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp

mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td

adam-smnk · 2025-10-14T16:42:11Z

mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp

+ matchAndRewrite(xegpu::MemDescSubviewOp op, OpAdaptor adaptor,
+ ConversionPatternRewriter &rewriter) const override {
+ return rewriter.notifyMatchFailure(
+ op, "MemDescSubviewOp are not supported on Xe2/Xe3 architecture.");


To me, it's still an open question how we'll help users across different generations.
A separate arch specific verifier pass might be needed.

For now, I'd be leaning toward leaving this as is just to slightly improve discoverability when ppl start wondering why this is missing or not working.

adam-smnk · 2025-10-14T16:49:30Z

mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp

+ if (subgroup_block_io)
+ return emitError() << "subgroup_block_io "
+ "are only allowed when result is a 1D VectorType.";
+ else


nit: else can be removed

adam-smnk · 2025-10-14T16:54:08Z

mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp

+ if (subgroup_block_io && mdescTy.isColMajor())
+ return emitError() << "mem_desc should be row major when "
+ "subgroup_block_io is set.";
+ } else if (dataShape.size() == 0) {


nit: maybe just else?

charithaintc

partial review.

charithaintc · 2025-10-14T23:35:14Z

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td

 Variadic<Index>: $offsets,
 DenseI64ArrayAttr: $const_offsets,
+ OptionalAttr<UnitAttr>:$subgroup_block_io,
 OptionalAttr<DistributeLayoutAttr>:$layout


please update the descrioption of the op with the meaning of block_io

charithaintc · 2025-10-14T23:35:45Z

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td

 XeGPU_MemDesc:$mem_desc,
 Variadic<Index>: $offsets,
 DenseI64ArrayAttr: $const_offsets,
+ OptionalAttr<UnitAttr>:$subgroup_block_io,


update description.

charithaintc · 2025-10-14T23:47:17Z

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td

+ // It first computes the original matrix shape using the stride info,
+ // then computes the number of blocks in each dimension of original shape,
+ // then compute the outer block shape and stride,
+ // then combines the inner and outer block shape and stride


nit: use code quotes for (mem_desc) for code examples. That way doxygen will generate more readable docs.

charithaintc

LGTM

charithaintc · 2025-10-15T21:13:01Z

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td

+ /// based on the rank and the value of the first stride dimension.
+ bool isColMajor() {
+ auto dim0 = dyn_cast<IntegerAttr>(getStrideAttr()[0]);
+ return getRank() == 2 && dim0 && dim0.getInt() == 1;


why check dim0 exists? it should always exist right?

charithaintc · 2025-10-15T21:14:21Z

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td

+ return getRank() == 2 && dim0 && dim0.getInt() == 1;
+ }
+
+ // get the Blocking shape for a MemDescType, Which is represented


nit: Capitalize the first letter of all comment sentences per coding standards.

charithaintc · 2025-10-15T21:16:02Z

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUTypes.td

+ // If it contains block attribute, the strides are blocked strides.
+ //
+ // The blocking is applied against the original matrix shape
+ // so that the linear offset is not impacted by the subview.


what is the subview you refer here? is it the subview to specific block?

charithaintc · 2025-10-15T21:36:50Z

mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp

+ matchAndRewrite(xegpu::CreateMemDescOp op, OpAdaptor adaptor,
+ ConversionPatternRewriter &rewriter) const override {
+
+ auto resTy = cast<xegpu::MemDescType>(op.getResult().getType());


why to use getType directly?

llvm-ci · 2025-10-16T00:45:24Z

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-hwasan running on sanitizer-buildbot11 while building mlir at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/18692

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) ... llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld.lld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 91352 tests, 72 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90 FAIL: MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir (86699 of 91352) ******************** TEST 'MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir | /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # executed command: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0.	Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # | #0 0x0000bce5d25bbae4 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # | #1 0x0000bce5d25b5b54 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:105:18 # | #2 0x0000bce5d25bdeec SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # | #3 0x0000fe56599a99c0 (linux-vdso.so.1+0x9c0) # | #4 0x0000fe56592da460 (/lib/aarch64-linux-gnu/libc.so.6+0x8a460) # | #5 0x0000fe56592876c0 raise (/lib/aarch64-linux-gnu/libc.so.6+0x376c0) # | #6 0x0000fe5659271ac8 abort (/lib/aarch64-linux-gnu/libc.so.6+0x21ac8) # | #7 0x0000bce5d23ce1e0 __sanitizer::Atexit(void (*)()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10 # | #8 0x0000bce5d23cc088 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # | #9 0x0000bce5d23b679c Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16 # | #10 0x0000bce5d23b679c ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51 # | #11 0x0000bce5d23b679c __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5 # | #12 0x0000bce5d23b5f24 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:477:7 # | #13 0x0000bce5d23b3cd0 __hwasan::ReportTagMismatch(__sanitizer::StackTrace*, unsigned long, unsigned long, bool, bool, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1094:1 # | #14 0x0000bce5d239f4ec Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31 # | #15 0x0000bce5d239f4ec ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56 # | #16 0x0000bce5d239f4ec __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void*, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1 # | #17 0x0000bce5d23a1a98 __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1 # | #18 0x0000bce5d23b78a4 __interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0 # | #19 0x0000bce5ddf0f278 size /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:79:32 # | #20 0x0000bce5ddf0f278 operator[] /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:294:5 # | #21 0x0000bce5ddf0f278 mlir::xegpu::MemDescType::getLinearOffsets(mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp:869:30 # | #22 0x0000bce5dde48278 (anonymous namespace)::LoadStoreMatrixToXeVMPattern<mlir::xegpu::LoadMatrixOp, void>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpAdaptor, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp:587:34 # | #23 0x0000bce5dde4968c ~__optional_destruct_base /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/libcxx_install_hwasan/include/c++/v1/optional:320:9 # | #24 0x0000bce5dde4968c llvm::LogicalResult mlir::ConversionPattern::dispatchTo1To1<mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>, mlir::xegpu::LoadMatrixOp>(mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp> const&, mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOp::GenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:1027:1 # | #25 0x0000bce5dde48f40 mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpGenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:727:5 Step 11 (stage2/hwasan check) failure: stage2/hwasan check (failure) ... llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld.lld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 91352 tests, 72 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90 FAIL: MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir (86699 of 91352) ******************** TEST 'MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir | /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # executed command: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0.	Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # | #0 0x0000bce5d25bbae4 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # | #1 0x0000bce5d25b5b54 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:105:18 # | #2 0x0000bce5d25bdeec SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # | #3 0x0000fe56599a99c0 (linux-vdso.so.1+0x9c0) # | #4 0x0000fe56592da460 (/lib/aarch64-linux-gnu/libc.so.6+0x8a460) # | #5 0x0000fe56592876c0 raise (/lib/aarch64-linux-gnu/libc.so.6+0x376c0) # | #6 0x0000fe5659271ac8 abort (/lib/aarch64-linux-gnu/libc.so.6+0x21ac8) # | #7 0x0000bce5d23ce1e0 __sanitizer::Atexit(void (*)()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10 # | #8 0x0000bce5d23cc088 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # | #9 0x0000bce5d23b679c Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16 # | #10 0x0000bce5d23b679c ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51 # | #11 0x0000bce5d23b679c __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5 # | #12 0x0000bce5d23b5f24 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:477:7 # | #13 0x0000bce5d23b3cd0 __hwasan::ReportTagMismatch(__sanitizer::StackTrace*, unsigned long, unsigned long, bool, bool, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1094:1 # | #14 0x0000bce5d239f4ec Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31 # | #15 0x0000bce5d239f4ec ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56 # | #16 0x0000bce5d239f4ec __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void*, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1 # | #17 0x0000bce5d23a1a98 __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1 # | #18 0x0000bce5d23b78a4 __interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0 # | #19 0x0000bce5ddf0f278 size /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:79:32 # | #20 0x0000bce5ddf0f278 operator[] /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:294:5 # | #21 0x0000bce5ddf0f278 mlir::xegpu::MemDescType::getLinearOffsets(mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp:869:30 # | #22 0x0000bce5dde48278 (anonymous namespace)::LoadStoreMatrixToXeVMPattern<mlir::xegpu::LoadMatrixOp, void>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpAdaptor, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp:587:34 # | #23 0x0000bce5dde4968c ~__optional_destruct_base /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/libcxx_install_hwasan/include/c++/v1/optional:320:9 # | #24 0x0000bce5dde4968c llvm::LogicalResult mlir::ConversionPattern::dispatchTo1To1<mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>, mlir::xegpu::LoadMatrixOp>(mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp> const&, mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOp::GenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:1027:1 # | #25 0x0000bce5dde48f40 mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpGenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:727:5

vitalybuka · 2025-10-16T02:06:41Z

Also https://lab.llvm.org/buildbot/#/builders/24/builds/13699

vitalybuka · 2025-10-16T02:44:30Z

Reverting here #163684

…rix" (#163684) Reverts #162780 Breaks build bots, see #162780.

llvm-ci · 2025-10-16T03:19:45Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot3 while building mlir at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/16040

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) ... llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 92702 tests, 64 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. FAIL: MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir (92694 of 92702) ******************** TEST 'MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0.	Program arguments: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # | #0 0x0000640ea362c3b6 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13 # | #1 0x0000640ea38b9648 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # | #2 0x0000640ea38b2de9 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5 # | #3 0x0000640ea38bb74e SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # | #4 0x000071c9d1a458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0) # | #5 0x000071c9d1aa49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc) # | #6 0x000071c9d1a4579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e) # | #7 0x000071c9d1a288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd) # | #8 0x0000640ea36af09c (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt+0x132fc09c) # | #9 0x0000640ea36acf3e __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # | #10 0x0000640ea368db9b push_back /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common.h:543:7 # | #11 0x0000640ea368db9b __asan::ScopedInErrorReport::~ScopedInErrorReport() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:193:29 # | #12 0x0000640ea368fa2d __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:536:1 # | #13 0x0000640ea3690836 __asan_report_load8 /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_rtl.cpp:131:1 # | #14 0x0000640eb03b9e3d mlir::xegpu::MemDescType::getLinearOffsets(mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp:869:30 # | #15 0x0000640eb02eca83 (anonymous namespace)::LoadStoreMatrixToXeVMPattern<mlir::xegpu::LoadMatrixOp, void>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpAdaptor, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp:0:34 # | #16 0x0000640eb02ee127 llvm::LogicalResult mlir::ConversionPattern::dispatchTo1To1<mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>, mlir::xegpu::LoadMatrixOp>(mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp> const&, mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOp::GenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:1025:15 # | #17 0x0000640eb02ed3d6 mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpGenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:727:12 # | #18 0x0000640eb02ec1e8 mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::ValueRange>, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:713:12 # | #19 0x0000640eb88d7bdc mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2359:10 # | #20 0x0000640eb89a9e40 operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Rewrite/PatternApplicator.cpp:223:13 # | #21 0x0000640eb89a9e40 void llvm::function_ref<void ()>::callback_fn<mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>)::$_0>(long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12 # | #22 0x0000640eb899d5c9 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:0:12 # | #23 0x0000640eb88db636 legalizeWithPattern /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2797:21 # | #24 0x0000640eb88db636 (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2563:17 # | #25 0x0000640eb88da326 mlir::OperationConverter::convert(mlir::Operation*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:0:0 Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure) ... llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 92702 tests, 64 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. FAIL: MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir (92694 of 92702) ******************** TEST 'MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0.	Program arguments: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # | #0 0x0000640ea362c3b6 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13 # | #1 0x0000640ea38b9648 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # | #2 0x0000640ea38b2de9 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5 # | #3 0x0000640ea38bb74e SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # | #4 0x000071c9d1a458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0) # | #5 0x000071c9d1aa49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc) # | #6 0x000071c9d1a4579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e) # | #7 0x000071c9d1a288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd) # | #8 0x0000640ea36af09c (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt+0x132fc09c) # | #9 0x0000640ea36acf3e __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # | #10 0x0000640ea368db9b push_back /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common.h:543:7 # | #11 0x0000640ea368db9b __asan::ScopedInErrorReport::~ScopedInErrorReport() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:193:29 # | #12 0x0000640ea368fa2d __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:536:1 # | #13 0x0000640ea3690836 __asan_report_load8 /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_rtl.cpp:131:1 # | #14 0x0000640eb03b9e3d mlir::xegpu::MemDescType::getLinearOffsets(mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp:869:30 # | #15 0x0000640eb02eca83 (anonymous namespace)::LoadStoreMatrixToXeVMPattern<mlir::xegpu::LoadMatrixOp, void>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpAdaptor, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp:0:34 # | #16 0x0000640eb02ee127 llvm::LogicalResult mlir::ConversionPattern::dispatchTo1To1<mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>, mlir::xegpu::LoadMatrixOp>(mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp> const&, mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOp::GenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:1025:15 # | #17 0x0000640eb02ed3d6 mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpGenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:727:12 # | #18 0x0000640eb02ec1e8 mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::ValueRange>, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:713:12 # | #19 0x0000640eb88d7bdc mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2359:10 # | #20 0x0000640eb89a9e40 operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Rewrite/PatternApplicator.cpp:223:13 # | #21 0x0000640eb89a9e40 void llvm::function_ref<void ()>::callback_fn<mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>)::$_0>(long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12 # | #22 0x0000640eb899d5c9 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:0:12 # | #23 0x0000640eb88db636 legalizeWithPattern /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2797:21 # | #24 0x0000640eb88db636 (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2563:17 # | #25 0x0000640eb88da326 mlir::OperationConverter::convert(mlir::Operation*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:0:0 Step 14 (stage2/msan check) failure: stage2/msan check (failure) ... llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 92699 tests, 64 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. FAIL: MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir (92694 of 92699) ******************** TEST 'MLIR :: Conversion/XeGPUToXeVM/loadstore_matrix.mlir' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0.	Program arguments: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt -split-input-file -convert-xegpu-to-xevm -cse /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir # | #0 0x0000581cf4642772 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13 # | #1 0x0000581cf477cd6e llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:13 # | #2 0x0000581cf477a2e8 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5 # | #3 0x0000581cf477e12d SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # | #4 0x0000581cf46763be IsInInterceptorScope /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:78:10 # | #5 0x0000581cf46763be SignalAction(int, void*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1167:3 # | #6 0x000074eea72458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0) # | #7 0x000074eea72a49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc) # | #8 0x000074eea724579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e) # | #9 0x000074eea72288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd) # | #10 0x0000581cf46009fc (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt+0x89259fc) # | #11 0x0000581cf45fe89e __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # | #12 0x0000581cf4615ee3 (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt+0x893aee3) # | #13 0x0000581cf47862f9 auto llvm::dyn_cast_if_present<mlir::Attribute, mlir::OpFoldResult>(mlir::OpFoldResult&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Casting.h:739:0 # | #14 0x0000581cff85e5d2 mlir::getValueOrCreateConstantIndexOp(mlir::OpBuilder&, mlir::Location, mlir::OpFoldResult) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/Arith/Utils/Utils.cpp:113:20 # | #15 0x0000581cfb611d91 mlir::OpFoldResult mlir::xegpu::genBinOp<mlir::arith::MulIOp>(mlir::OpFoldResult, mlir::OpFoldResult, mlir::Location, mlir::OpBuilder&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp:0:15 # | #16 0x0000581cfb611a46 mlir::xegpu::MemDescType::getLinearOffsets(mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp:870:20 # | #17 0x0000581cfb593423 (anonymous namespace)::LoadStoreMatrixToXeVMPattern<mlir::xegpu::LoadMatrixOp, void>::matchAndRewrite(mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOpAdaptor, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp:588:20 # | #18 0x0000581cfb5946ae llvm::LogicalResult mlir::ConversionPattern::dispatchTo1To1<mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>, mlir::xegpu::LoadMatrixOp>(mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp> const&, mlir::xegpu::LoadMatrixOp, mlir::xegpu::LoadMatrixOp::GenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:0:15 # | #19 0x0000581cfb592e63 mlir::OpConversionPattern<mlir::xegpu::LoadMatrixOp>::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::ValueRange>, mlir::ConversionPatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:715:3 # | #20 0x0000581cffb20857 mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2359:10 # | #21 0x0000581cffb8ee6e operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Rewrite/PatternApplicator.cpp:223:13 # | #22 0x0000581cffb8ee6e void llvm::function_ref<void ()>::callback_fn<mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>)::$_0>(long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12 # | #23 0x0000581cffb88ee5 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Rewrite/PatternApplicator.cpp:242:9 # | #24 0x0000581cffb22586 (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2563:7 # | #25 0x0000581cffb21799 mlir::OperationConverter::convert(mlir::Operation*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:3318:7

…x/store_matrix" (#163684) Reverts llvm/llvm-project#162780 Breaks build bots, see #162780.

…x sanitizer issue (#163858) This PR fix the sanitizer issue reported post-merge for #162780

…matrix + fix sanitizer issue (#163858) This PR fix the sanitizer issue reported post-merge for llvm/llvm-project#162780

Jianhui-Li added 6 commits October 7, 2025 22:08

pass basic lowering test

4c58d3d

add attributes

554b95e

add tests and refactoring

446b951

bug fixes

9f9744c

Merge branch 'main' into load-matrix-WI-attributes

c89c5db

polish tests

bbd43d0

llvmbot added mlir:gpu mlir labels Oct 10, 2025

fix minor issues

0344761

adam-smnk reviewed Oct 10, 2025

View reviewed changes

silee2 reviewed Oct 10, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPUUnroll.cpp Show resolved Hide resolved

silee2 reviewed Oct 10, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPUUnroll.cpp Show resolved Hide resolved

silee2 reviewed Oct 10, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp Show resolved Hide resolved

silee2 reviewed Oct 10, 2025

View reviewed changes

mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp Show resolved Hide resolved

akroviakov reviewed Oct 13, 2025

View reviewed changes

Jianhui-Li added 4 commits October 13, 2025 18:04

remove vector direction and lenght attirbutes

966525b

address comments

272f512

address more comments

b1857a2

Merge branch 'main' into load-matrix-WI-attributes

2d73e04

akroviakov approved these changes Oct 14, 2025

View reviewed changes

adam-smnk reviewed Oct 14, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp Show resolved Hide resolved

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td Show resolved Hide resolved

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td Show resolved Hide resolved

adam-smnk approved these changes Oct 14, 2025

View reviewed changes

Jianhui-Li added 2 commits October 14, 2025 21:24

address more feedback

7a63d93

address minor comments

de87d09

charithaintc self-requested a review October 14, 2025 23:25

charithaintc reviewed Oct 15, 2025

View reviewed changes

charithaintc self-requested a review October 15, 2025 20:26

charithaintc approved these changes Oct 15, 2025

View reviewed changes

Jianhui-Li added 2 commits October 15, 2025 23:35

address comments

faa0bfb

Merge branch 'main' into load-matrix-WI-attributes

4862f2a

Jianhui-Li merged commit 6cae29f into main Oct 15, 2025
10 checks passed

Jianhui-Li deleted the users/Jianhui-Li/XeGPU/load-matrix-WI-attributes branch October 15, 2025 23:50

rupprecht added a commit to rupprecht/llvm-project that referenced this pull request Oct 16, 2025

[bazel][mlir] Port llvm#162780: add XeVM dep

d3acc30

rupprecht added a commit that referenced this pull request Oct 16, 2025

[bazel][mlir] Port #162780: add XeVM dep (#163678)

f98a256

vitalybuka mentioned this pull request Oct 16, 2025

Revert "[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix" #163684

Merged

vitalybuka added a commit that referenced this pull request Oct 16, 2025

Revert "[MLIR][XeGPU] XeVM lowering support for load_matrix/store_mat…

d43581a

…rix" (#163684) Reverts #162780 Breaks build bots, see #162780.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 16, 2025

Automerge: Revert "[MLIR][XeGPU] XeVM lowering support for load_matri…

162b99e

…x/store_matrix" (#163684) Reverts llvm/llvm-project#162780 Breaks build bots, see #162780.

Jianhui-Li mentioned this pull request Oct 16, 2025

[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix + fix sanitizer issue #163858

Merged

Jianhui-Li added a commit that referenced this pull request Oct 16, 2025

[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix + fi…

77cb19d

…x sanitizer issue (#163858) This PR fix the sanitizer issue reported post-merge for #162780

[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix #162780

[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix #162780

Uh oh!

Conversation

Jianhui-Li commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llvmbot commented Oct 10, 2025

llvmbot commented Oct 10, 2025

adam-smnk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charithaintc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charithaintc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

llvm-ci commented Oct 16, 2025

vitalybuka commented Oct 16, 2025

vitalybuka commented Oct 16, 2025

llvm-ci commented Oct 16, 2025

Labels

8 participants

Jianhui-Li commented Oct 10, 2025 •

edited

Loading