Use MLIR/IREE for GPU CodeGen

With this patch (and its dependent patches) the following module

$ cat test.mlir module attributes { spv.target_env = #spv.target_env<#spv.vce<v1.3, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {max_compute_workgroup_invocations = 128 : i32, max_compute_workgroup_size = dense<[128, 128, 64]> : vector<3xi32>}>} { func @matmul(%arg0: memref<16x16xf32>, %arg1: memref<16x16xf32>, %ret0: memref<16x16xf32>) { linalg.matmul %arg0, %arg1, %ret0 : (memref<16x16xf32>, memref<16x16xf32>, memref<16x16xf32>) -> () return } } 

can be lowered to spir-v with the following command line

$ mlir-opt -linalg-tile-to-parallel-loops="linalg-tile-sizes=8,4" -convert-linalg-to-parallel-loops -test-gpu-greedy-parallel-loop-mapping -convert-parallel-loops-to-gpu -gpu-kernel-outlining -lower-affine -canonicalize -legalize-std-for-spirv -test-spirv-entry-point-abi="workgroup-size=4,8" -convert-gpu-to-spirv test.mlir 

The additional flags are for

  • Lowering affine.apply instructions inserted by the convert-parallel-loops-to-gpu pass.
  • Canonicalization to just get around some missing lowering for std.br` instruction
  • Some ops like subview cannot be lowered directly to SPIR-V, so they are folded into their load/store uses
  • Set workgroup size for the gpu.func kernel function (here [8, 4, 1])
  • Lower gpu.func and its body to spirv dialect.
1 Like