Skip to content

Conversation

@Pierre-vh
Copy link
Contributor

Determines whether we can use SCOPE_CU stores (on by default), or
whether all stores must be done at SCOPE_SE minimum.

@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2025

@llvm/pr-subscribers-llvm-support

Author: Pierre van Houtryve (Pierre-vh)

Changes

Determines whether we can use SCOPE_CU stores (on by default), or
whether all stores must be done at SCOPE_SE minimum.


Full diff: https://github.com/llvm/llvm-project/pull/150588.diff

11 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+8-1)
  • (modified) llvm/include/llvm/Support/AMDHSAKernelDescriptor.h (+2-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+7)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+5-1)
  • (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+6)
  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+3)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+5)
  • (modified) llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp (+3-1)
  • (added) llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll (+100)
  • (modified) llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test (+4-4)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index e46437ae092c4..caaae1c3673a3 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -768,6 +768,9 @@ For example: performant than code generated for XNACK replay disabled. + cu-stores TODO On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used. + If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater. + =============== ============================ ================================================== .. _amdgpu-target-id: @@ -5107,7 +5110,9 @@ The fields used by CP for code objects before V3 also match those specified in and must be 0, >454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT _SIZE - 457:455 3 bits Reserved, must be 0. + 455 1 bit USES_CU_STORES GFX12.5: Whether the ``cu-stores`` target attribute is enabled. + If 0, then all stores are ``SCOPE_SE`` or higher. + 457:456 2 bits Reserved, must be 0. 458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9 Reserved, must be 0. GFX10-GFX11 @@ -18185,6 +18190,8 @@ terminated by an ``.end_amdhsa_kernel`` directive. GFX942) ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. + ``.amdhsa_uses_cu_stores`` 0 GFX12.5 Controls USES_CU_STORES in + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific diff --git a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h index a119b0724d677..8f367390c531c 100644 --- a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h +++ b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h @@ -223,7 +223,8 @@ enum : int32_t { KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1), - KERNEL_CODE_PROPERTY(RESERVED0, 7, 3), + KERNEL_CODE_PROPERTY(RESERVED0, 7, 2), + KERNEL_CODE_PROPERTY(USES_CU_STORES, 9, 1), // GFX12.5 +cu-stores KERNEL_CODE_PROPERTY(ENABLE_WAVEFRONT_SIZE32, 10, 1), // GFX10+ KERNEL_CODE_PROPERTY(USES_DYNAMIC_STACK, 11, 1), KERNEL_CODE_PROPERTY(RESERVED1, 12, 4), diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 2a36f3dea34ce..ca53059680e12 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -268,6 +268,12 @@ def FeatureSafeSmemPrefetch : SubtargetFeature<"safe-smem-prefetch", "SMEM prefetches do not fail on illegal address" >; +def FeatureCUStores : SubtargetFeature<"cu-stores", + "HasCUStores", + "true", + "Whether SCOPE_CU stores can be used on GFX12.5" +>; + def FeatureVcmpxExecWARHazard : SubtargetFeature<"vcmpx-exec-war-hazard", "HasVcmpxExecWARHazard", "true", @@ -1970,6 +1976,7 @@ def FeatureISAVersion12 : FeatureSet< def FeatureISAVersion12_50 : FeatureSet< [FeatureGFX12, FeatureGFX1250Insts, + FeatureCUStores, FeatureCuMode, Feature64BitLiterals, FeatureLDSBankCount32, diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index 4b3dc371c65f0..668139383f56c 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -552,6 +552,7 @@ const MCExpr *AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties( MCContext &Ctx = MF.getContext(); uint16_t KernelCodeProperties = 0; const GCNUserSGPRUsageInfo &UserSGPRInfo = MFI.getUserSGPRInfo(); + const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>(); if (UserSGPRInfo.hasPrivateSegmentBuffer()) { KernelCodeProperties |= @@ -581,10 +582,13 @@ const MCExpr *AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties( KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE; } - if (MF.getSubtarget<GCNSubtarget>().isWave32()) { + if (ST.isWave32()) { KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32; } + if (isGFX1250(ST) && ST.hasCUStores()) { + KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES; + } // CurrentProgramInfo.DynamicCallStack is a MCExpr and could be // un-evaluatable at this point so it cannot be conditionally checked here. diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp index 421fc429048ff..44e65b3588888 100644 --- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -6066,6 +6066,12 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() { ExprVal, ValRange); if (Val) ImpliedUserSGPRCount += 1; + } else if (ID == ".amdhsa_uses_cu_stores") { + if (!isGFX1250()) + return Error(IDRange.Start, "directive requires gfx12.5", IDRange); + + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_USES_CU_STORES, ExprVal, ValRange); } else if (ID == ".amdhsa_wavefront_size32") { EXPR_RESOLVE_OR_ERROR(EvaluatableExpr); if (IVersion.Major < 10) diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp index 5c1989b345bdc..ffe6b0649cb94 100644 --- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp +++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp @@ -2556,6 +2556,9 @@ Expected<bool> AMDGPUDisassembler::decodeKernelDescriptorDirective( KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT); PRINT_DIRECTIVE(".amdhsa_user_sgpr_private_segment_size", KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE); + if (isGFX1250()) + PRINT_DIRECTIVE(".amdhsa_uses_cu_stores", + KERNEL_CODE_PROPERTY_USES_CU_STORES); if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED0) return createReservedKDBitsError(KERNEL_CODE_PROPERTY_RESERVED0, diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 0435e7f9e51d2..84f2676602950 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h @@ -245,6 +245,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool HasSMEMtoVectorWriteHazard = false; bool HasInstFwdPrefetchBug = false; bool HasSafeSmemPrefetch = false; + bool HasCUStores = false; bool HasVcmpxExecWARHazard = false; bool HasLdsBranchVmemWARHazard = false; bool HasNSAtoVMEMBug = false; @@ -989,6 +990,8 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool hasSafeSmemPrefetch() const { return HasSafeSmemPrefetch; } + bool hasCUStores() const { return HasCUStores; } + // Has s_cmpk_* instructions. bool hasSCmpK() const { return getGeneration() < GFX12; } diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp index 10f6d3382368f..43ca54894b963 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp @@ -440,6 +440,11 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor( amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE_SHIFT, amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, ".amdhsa_user_sgpr_private_segment_size"); + if (isGFX1250(STI)) + PrintField(KD.kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES_SHIFT, + amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES, + ".amdhsa_uses_cu_stores"); if (IVersion.Major >= 10) PrintField(KD.kernel_code_properties, amdhsa::KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32_SHIFT, diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp index d6337a85a7361..315dac555ab80 100644 --- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp +++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp @@ -2554,7 +2554,9 @@ bool SIGfx12CacheControl::finalizeStore(MachineInstr &MI, bool Atomic) const { // GFX12.5 only: Require SCOPE_SE on stores that may hit the scratch address // space. - if (TII->mayAccessScratchThroughFlat(MI) && Scope == CPol::SCOPE_CU) + // We also require SCOPE_SE minimum if we not have the "cu-stores" feature. + if (Scope == CPol::SCOPE_CU && + (!ST.hasCUStores() || TII->mayAccessScratchThroughFlat(MI))) return setScope(MI, CPol::SCOPE_SE); return false; diff --git a/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll b/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll new file mode 100644 index 0000000000000..d13d76fcfabf4 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll @@ -0,0 +1,100 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -O3 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GCN,CU %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -O3 -mcpu=gfx1250 -mattr=-cu-stores < %s | FileCheck --check-prefixes=GCN,NOCU %s + +; Check that if -cu-stores is used, we use SCOPE_SE minimum on all stores. + +; GCN: flat_store: +; CU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel flat_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @flat_store(ptr %dst, i32 %val) { +entry: + store i32 %val, ptr %dst + ret void +} + +; GCN: global_store: +; CU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}}{{$}} +; NOCU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel global_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @global_store(ptr addrspace(1) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(1) %dst + ret void +} + +; GCN: local_store: +; CU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; NOCU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; GCN: .amdhsa_kernel local_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @local_store(ptr addrspace(3) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(3) %dst + ret void +} + +; GCN: scratch_store: +; CU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel scratch_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @scratch_store(ptr addrspace(5) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(5) %dst + ret void +} + +; GCN: flat_atomic_store: +; CU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel flat_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @flat_atomic_store(ptr %dst, i32 %val) { +entry: + store atomic i32 %val, ptr %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: global_atomic_store: +; CU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}}{{$}} +; NOCU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel global_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @global_atomic_store(ptr addrspace(1) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(1) %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: local_atomic_store: +; CU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; NOCU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; GCN: .amdhsa_kernel local_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @local_atomic_store(ptr addrspace(3) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(3) %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: scratch_atomic_store: +; CU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel scratch_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @scratch_atomic_store(ptr addrspace(5) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(5) %dst syncscope("wavefront") unordered, align 4 + ret void +} diff --git a/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test b/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test index fdca11b95caa6..369005f4ea432 100644 --- a/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test +++ b/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test @@ -13,10 +13,10 @@ # RES_4_2: ; error decoding test.kd: kernel descriptor reserved bits in range (511:480) set # RES_4_2-NEXT: ; decoding failed region as bytes -# RUN: yaml2obj %s -DGPU=GFX90A -DKD=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006000000000000 \ -# RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=RES_457 -# RES_457: ; error decoding test.kd: kernel descriptor reserved bits in range (457:455) set -# RES_457-NEXT: ; decoding failed region as bytes +# RUN: yaml2obj %s -DGPU=GFX90A -DKD=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003000000000000 \ +# RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=RES_456 +# RES_456: ; error decoding test.kd: kernel descriptor reserved bits in range (456:455) set +# RES_456-NEXT: ; decoding failed region as bytes # RUN: yaml2obj %s -DGPU=GFX90A -DKD=0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000c000000000000 \ # RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=WF32 
@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2025

@llvm/pr-subscribers-mc

Author: Pierre van Houtryve (Pierre-vh)

Changes

Determines whether we can use SCOPE_CU stores (on by default), or
whether all stores must be done at SCOPE_SE minimum.


Full diff: https://github.com/llvm/llvm-project/pull/150588.diff

11 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+8-1)
  • (modified) llvm/include/llvm/Support/AMDHSAKernelDescriptor.h (+2-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+7)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+5-1)
  • (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+6)
  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+3)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+5)
  • (modified) llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp (+3-1)
  • (added) llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll (+100)
  • (modified) llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test (+4-4)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index e46437ae092c4..caaae1c3673a3 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -768,6 +768,9 @@ For example: performant than code generated for XNACK replay disabled. + cu-stores TODO On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used. + If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater. + =============== ============================ ================================================== .. _amdgpu-target-id: @@ -5107,7 +5110,9 @@ The fields used by CP for code objects before V3 also match those specified in and must be 0, >454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT _SIZE - 457:455 3 bits Reserved, must be 0. + 455 1 bit USES_CU_STORES GFX12.5: Whether the ``cu-stores`` target attribute is enabled. + If 0, then all stores are ``SCOPE_SE`` or higher. + 457:456 2 bits Reserved, must be 0. 458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9 Reserved, must be 0. GFX10-GFX11 @@ -18185,6 +18190,8 @@ terminated by an ``.end_amdhsa_kernel`` directive. GFX942) ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. + ``.amdhsa_uses_cu_stores`` 0 GFX12.5 Controls USES_CU_STORES in + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific diff --git a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h index a119b0724d677..8f367390c531c 100644 --- a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h +++ b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h @@ -223,7 +223,8 @@ enum : int32_t { KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1), - KERNEL_CODE_PROPERTY(RESERVED0, 7, 3), + KERNEL_CODE_PROPERTY(RESERVED0, 7, 2), + KERNEL_CODE_PROPERTY(USES_CU_STORES, 9, 1), // GFX12.5 +cu-stores KERNEL_CODE_PROPERTY(ENABLE_WAVEFRONT_SIZE32, 10, 1), // GFX10+ KERNEL_CODE_PROPERTY(USES_DYNAMIC_STACK, 11, 1), KERNEL_CODE_PROPERTY(RESERVED1, 12, 4), diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 2a36f3dea34ce..ca53059680e12 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -268,6 +268,12 @@ def FeatureSafeSmemPrefetch : SubtargetFeature<"safe-smem-prefetch", "SMEM prefetches do not fail on illegal address" >; +def FeatureCUStores : SubtargetFeature<"cu-stores", + "HasCUStores", + "true", + "Whether SCOPE_CU stores can be used on GFX12.5" +>; + def FeatureVcmpxExecWARHazard : SubtargetFeature<"vcmpx-exec-war-hazard", "HasVcmpxExecWARHazard", "true", @@ -1970,6 +1976,7 @@ def FeatureISAVersion12 : FeatureSet< def FeatureISAVersion12_50 : FeatureSet< [FeatureGFX12, FeatureGFX1250Insts, + FeatureCUStores, FeatureCuMode, Feature64BitLiterals, FeatureLDSBankCount32, diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index 4b3dc371c65f0..668139383f56c 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -552,6 +552,7 @@ const MCExpr *AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties( MCContext &Ctx = MF.getContext(); uint16_t KernelCodeProperties = 0; const GCNUserSGPRUsageInfo &UserSGPRInfo = MFI.getUserSGPRInfo(); + const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>(); if (UserSGPRInfo.hasPrivateSegmentBuffer()) { KernelCodeProperties |= @@ -581,10 +582,13 @@ const MCExpr *AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties( KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE; } - if (MF.getSubtarget<GCNSubtarget>().isWave32()) { + if (ST.isWave32()) { KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32; } + if (isGFX1250(ST) && ST.hasCUStores()) { + KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES; + } // CurrentProgramInfo.DynamicCallStack is a MCExpr and could be // un-evaluatable at this point so it cannot be conditionally checked here. diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp index 421fc429048ff..44e65b3588888 100644 --- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -6066,6 +6066,12 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() { ExprVal, ValRange); if (Val) ImpliedUserSGPRCount += 1; + } else if (ID == ".amdhsa_uses_cu_stores") { + if (!isGFX1250()) + return Error(IDRange.Start, "directive requires gfx12.5", IDRange); + + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_USES_CU_STORES, ExprVal, ValRange); } else if (ID == ".amdhsa_wavefront_size32") { EXPR_RESOLVE_OR_ERROR(EvaluatableExpr); if (IVersion.Major < 10) diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp index 5c1989b345bdc..ffe6b0649cb94 100644 --- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp +++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp @@ -2556,6 +2556,9 @@ Expected<bool> AMDGPUDisassembler::decodeKernelDescriptorDirective( KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT); PRINT_DIRECTIVE(".amdhsa_user_sgpr_private_segment_size", KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE); + if (isGFX1250()) + PRINT_DIRECTIVE(".amdhsa_uses_cu_stores", + KERNEL_CODE_PROPERTY_USES_CU_STORES); if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED0) return createReservedKDBitsError(KERNEL_CODE_PROPERTY_RESERVED0, diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 0435e7f9e51d2..84f2676602950 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h @@ -245,6 +245,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool HasSMEMtoVectorWriteHazard = false; bool HasInstFwdPrefetchBug = false; bool HasSafeSmemPrefetch = false; + bool HasCUStores = false; bool HasVcmpxExecWARHazard = false; bool HasLdsBranchVmemWARHazard = false; bool HasNSAtoVMEMBug = false; @@ -989,6 +990,8 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool hasSafeSmemPrefetch() const { return HasSafeSmemPrefetch; } + bool hasCUStores() const { return HasCUStores; } + // Has s_cmpk_* instructions. bool hasSCmpK() const { return getGeneration() < GFX12; } diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp index 10f6d3382368f..43ca54894b963 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp @@ -440,6 +440,11 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor( amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE_SHIFT, amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, ".amdhsa_user_sgpr_private_segment_size"); + if (isGFX1250(STI)) + PrintField(KD.kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES_SHIFT, + amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES, + ".amdhsa_uses_cu_stores"); if (IVersion.Major >= 10) PrintField(KD.kernel_code_properties, amdhsa::KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32_SHIFT, diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp index d6337a85a7361..315dac555ab80 100644 --- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp +++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp @@ -2554,7 +2554,9 @@ bool SIGfx12CacheControl::finalizeStore(MachineInstr &MI, bool Atomic) const { // GFX12.5 only: Require SCOPE_SE on stores that may hit the scratch address // space. - if (TII->mayAccessScratchThroughFlat(MI) && Scope == CPol::SCOPE_CU) + // We also require SCOPE_SE minimum if we not have the "cu-stores" feature. + if (Scope == CPol::SCOPE_CU && + (!ST.hasCUStores() || TII->mayAccessScratchThroughFlat(MI))) return setScope(MI, CPol::SCOPE_SE); return false; diff --git a/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll b/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll new file mode 100644 index 0000000000000..d13d76fcfabf4 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll @@ -0,0 +1,100 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -O3 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GCN,CU %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -O3 -mcpu=gfx1250 -mattr=-cu-stores < %s | FileCheck --check-prefixes=GCN,NOCU %s + +; Check that if -cu-stores is used, we use SCOPE_SE minimum on all stores. + +; GCN: flat_store: +; CU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel flat_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @flat_store(ptr %dst, i32 %val) { +entry: + store i32 %val, ptr %dst + ret void +} + +; GCN: global_store: +; CU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}}{{$}} +; NOCU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel global_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @global_store(ptr addrspace(1) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(1) %dst + ret void +} + +; GCN: local_store: +; CU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; NOCU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; GCN: .amdhsa_kernel local_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @local_store(ptr addrspace(3) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(3) %dst + ret void +} + +; GCN: scratch_store: +; CU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel scratch_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @scratch_store(ptr addrspace(5) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(5) %dst + ret void +} + +; GCN: flat_atomic_store: +; CU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel flat_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @flat_atomic_store(ptr %dst, i32 %val) { +entry: + store atomic i32 %val, ptr %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: global_atomic_store: +; CU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}}{{$}} +; NOCU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel global_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @global_atomic_store(ptr addrspace(1) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(1) %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: local_atomic_store: +; CU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; NOCU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; GCN: .amdhsa_kernel local_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @local_atomic_store(ptr addrspace(3) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(3) %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: scratch_atomic_store: +; CU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel scratch_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @scratch_atomic_store(ptr addrspace(5) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(5) %dst syncscope("wavefront") unordered, align 4 + ret void +} diff --git a/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test b/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test index fdca11b95caa6..369005f4ea432 100644 --- a/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test +++ b/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test @@ -13,10 +13,10 @@ # RES_4_2: ; error decoding test.kd: kernel descriptor reserved bits in range (511:480) set # RES_4_2-NEXT: ; decoding failed region as bytes -# RUN: yaml2obj %s -DGPU=GFX90A -DKD=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006000000000000 \ -# RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=RES_457 -# RES_457: ; error decoding test.kd: kernel descriptor reserved bits in range (457:455) set -# RES_457-NEXT: ; decoding failed region as bytes +# RUN: yaml2obj %s -DGPU=GFX90A -DKD=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003000000000000 \ +# RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=RES_456 +# RES_456: ; error decoding test.kd: kernel descriptor reserved bits in range (456:455) set +# RES_456-NEXT: ; decoding failed region as bytes # RUN: yaml2obj %s -DGPU=GFX90A -DKD=0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000c000000000000 \ # RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=WF32 
@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Determines whether we can use SCOPE_CU stores (on by default), or
whether all stores must be done at SCOPE_SE minimum.


Full diff: https://github.com/llvm/llvm-project/pull/150588.diff

11 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+8-1)
  • (modified) llvm/include/llvm/Support/AMDHSAKernelDescriptor.h (+2-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+7)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+5-1)
  • (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+6)
  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+3)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+5)
  • (modified) llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp (+3-1)
  • (added) llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll (+100)
  • (modified) llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test (+4-4)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index e46437ae092c4..caaae1c3673a3 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -768,6 +768,9 @@ For example: performant than code generated for XNACK replay disabled. + cu-stores TODO On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used. + If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater. + =============== ============================ ================================================== .. _amdgpu-target-id: @@ -5107,7 +5110,9 @@ The fields used by CP for code objects before V3 also match those specified in and must be 0, >454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT _SIZE - 457:455 3 bits Reserved, must be 0. + 455 1 bit USES_CU_STORES GFX12.5: Whether the ``cu-stores`` target attribute is enabled. + If 0, then all stores are ``SCOPE_SE`` or higher. + 457:456 2 bits Reserved, must be 0. 458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9 Reserved, must be 0. GFX10-GFX11 @@ -18185,6 +18190,8 @@ terminated by an ``.end_amdhsa_kernel`` directive. GFX942) ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. + ``.amdhsa_uses_cu_stores`` 0 GFX12.5 Controls USES_CU_STORES in + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific diff --git a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h index a119b0724d677..8f367390c531c 100644 --- a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h +++ b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h @@ -223,7 +223,8 @@ enum : int32_t { KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1), - KERNEL_CODE_PROPERTY(RESERVED0, 7, 3), + KERNEL_CODE_PROPERTY(RESERVED0, 7, 2), + KERNEL_CODE_PROPERTY(USES_CU_STORES, 9, 1), // GFX12.5 +cu-stores KERNEL_CODE_PROPERTY(ENABLE_WAVEFRONT_SIZE32, 10, 1), // GFX10+ KERNEL_CODE_PROPERTY(USES_DYNAMIC_STACK, 11, 1), KERNEL_CODE_PROPERTY(RESERVED1, 12, 4), diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 2a36f3dea34ce..ca53059680e12 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -268,6 +268,12 @@ def FeatureSafeSmemPrefetch : SubtargetFeature<"safe-smem-prefetch", "SMEM prefetches do not fail on illegal address" >; +def FeatureCUStores : SubtargetFeature<"cu-stores", + "HasCUStores", + "true", + "Whether SCOPE_CU stores can be used on GFX12.5" +>; + def FeatureVcmpxExecWARHazard : SubtargetFeature<"vcmpx-exec-war-hazard", "HasVcmpxExecWARHazard", "true", @@ -1970,6 +1976,7 @@ def FeatureISAVersion12 : FeatureSet< def FeatureISAVersion12_50 : FeatureSet< [FeatureGFX12, FeatureGFX1250Insts, + FeatureCUStores, FeatureCuMode, Feature64BitLiterals, FeatureLDSBankCount32, diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index 4b3dc371c65f0..668139383f56c 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -552,6 +552,7 @@ const MCExpr *AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties( MCContext &Ctx = MF.getContext(); uint16_t KernelCodeProperties = 0; const GCNUserSGPRUsageInfo &UserSGPRInfo = MFI.getUserSGPRInfo(); + const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>(); if (UserSGPRInfo.hasPrivateSegmentBuffer()) { KernelCodeProperties |= @@ -581,10 +582,13 @@ const MCExpr *AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties( KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE; } - if (MF.getSubtarget<GCNSubtarget>().isWave32()) { + if (ST.isWave32()) { KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32; } + if (isGFX1250(ST) && ST.hasCUStores()) { + KernelCodeProperties |= amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES; + } // CurrentProgramInfo.DynamicCallStack is a MCExpr and could be // un-evaluatable at this point so it cannot be conditionally checked here. diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp index 421fc429048ff..44e65b3588888 100644 --- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -6066,6 +6066,12 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() { ExprVal, ValRange); if (Val) ImpliedUserSGPRCount += 1; + } else if (ID == ".amdhsa_uses_cu_stores") { + if (!isGFX1250()) + return Error(IDRange.Start, "directive requires gfx12.5", IDRange); + + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_USES_CU_STORES, ExprVal, ValRange); } else if (ID == ".amdhsa_wavefront_size32") { EXPR_RESOLVE_OR_ERROR(EvaluatableExpr); if (IVersion.Major < 10) diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp index 5c1989b345bdc..ffe6b0649cb94 100644 --- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp +++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp @@ -2556,6 +2556,9 @@ Expected<bool> AMDGPUDisassembler::decodeKernelDescriptorDirective( KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT); PRINT_DIRECTIVE(".amdhsa_user_sgpr_private_segment_size", KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE); + if (isGFX1250()) + PRINT_DIRECTIVE(".amdhsa_uses_cu_stores", + KERNEL_CODE_PROPERTY_USES_CU_STORES); if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED0) return createReservedKDBitsError(KERNEL_CODE_PROPERTY_RESERVED0, diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 0435e7f9e51d2..84f2676602950 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h @@ -245,6 +245,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool HasSMEMtoVectorWriteHazard = false; bool HasInstFwdPrefetchBug = false; bool HasSafeSmemPrefetch = false; + bool HasCUStores = false; bool HasVcmpxExecWARHazard = false; bool HasLdsBranchVmemWARHazard = false; bool HasNSAtoVMEMBug = false; @@ -989,6 +990,8 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool hasSafeSmemPrefetch() const { return HasSafeSmemPrefetch; } + bool hasCUStores() const { return HasCUStores; } + // Has s_cmpk_* instructions. bool hasSCmpK() const { return getGeneration() < GFX12; } diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp index 10f6d3382368f..43ca54894b963 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp @@ -440,6 +440,11 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor( amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE_SHIFT, amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, ".amdhsa_user_sgpr_private_segment_size"); + if (isGFX1250(STI)) + PrintField(KD.kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES_SHIFT, + amdhsa::KERNEL_CODE_PROPERTY_USES_CU_STORES, + ".amdhsa_uses_cu_stores"); if (IVersion.Major >= 10) PrintField(KD.kernel_code_properties, amdhsa::KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32_SHIFT, diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp index d6337a85a7361..315dac555ab80 100644 --- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp +++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp @@ -2554,7 +2554,9 @@ bool SIGfx12CacheControl::finalizeStore(MachineInstr &MI, bool Atomic) const { // GFX12.5 only: Require SCOPE_SE on stores that may hit the scratch address // space. - if (TII->mayAccessScratchThroughFlat(MI) && Scope == CPol::SCOPE_CU) + // We also require SCOPE_SE minimum if we not have the "cu-stores" feature. + if (Scope == CPol::SCOPE_CU && + (!ST.hasCUStores() || TII->mayAccessScratchThroughFlat(MI))) return setScope(MI, CPol::SCOPE_SE); return false; diff --git a/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll b/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll new file mode 100644 index 0000000000000..d13d76fcfabf4 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll @@ -0,0 +1,100 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -O3 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GCN,CU %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -O3 -mcpu=gfx1250 -mattr=-cu-stores < %s | FileCheck --check-prefixes=GCN,NOCU %s + +; Check that if -cu-stores is used, we use SCOPE_SE minimum on all stores. + +; GCN: flat_store: +; CU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel flat_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @flat_store(ptr %dst, i32 %val) { +entry: + store i32 %val, ptr %dst + ret void +} + +; GCN: global_store: +; CU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}}{{$}} +; NOCU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel global_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @global_store(ptr addrspace(1) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(1) %dst + ret void +} + +; GCN: local_store: +; CU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; NOCU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; GCN: .amdhsa_kernel local_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @local_store(ptr addrspace(3) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(3) %dst + ret void +} + +; GCN: scratch_store: +; CU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel scratch_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @scratch_store(ptr addrspace(5) %dst, i32 %val) { +entry: + store i32 %val, ptr addrspace(5) %dst + ret void +} + +; GCN: flat_atomic_store: +; CU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: flat_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel flat_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @flat_atomic_store(ptr %dst, i32 %val) { +entry: + store atomic i32 %val, ptr %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: global_atomic_store: +; CU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}}{{$}} +; NOCU: global_store_b32 v{{.*}}, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel global_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @global_atomic_store(ptr addrspace(1) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(1) %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: local_atomic_store: +; CU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; NOCU: ds_store_b32 v{{.*}}, v{{.*}}{{$}} +; GCN: .amdhsa_kernel local_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @local_atomic_store(ptr addrspace(3) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(3) %dst syncscope("wavefront") unordered, align 4 + ret void +} + +; GCN: scratch_atomic_store: +; CU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; NOCU: scratch_store_b32 off, v{{.*}}, s{{.*}} scope:SCOPE_SE +; GCN: .amdhsa_kernel scratch_atomic_store +; CU: .amdhsa_uses_cu_stores 1 +; NOCU: .amdhsa_uses_cu_stores 0 +define amdgpu_kernel void @scratch_atomic_store(ptr addrspace(5) %dst, i32 %val) { +entry: + store atomic i32 %val, ptr addrspace(5) %dst syncscope("wavefront") unordered, align 4 + ret void +} diff --git a/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test b/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test index fdca11b95caa6..369005f4ea432 100644 --- a/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test +++ b/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test @@ -13,10 +13,10 @@ # RES_4_2: ; error decoding test.kd: kernel descriptor reserved bits in range (511:480) set # RES_4_2-NEXT: ; decoding failed region as bytes -# RUN: yaml2obj %s -DGPU=GFX90A -DKD=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006000000000000 \ -# RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=RES_457 -# RES_457: ; error decoding test.kd: kernel descriptor reserved bits in range (457:455) set -# RES_457-NEXT: ; decoding failed region as bytes +# RUN: yaml2obj %s -DGPU=GFX90A -DKD=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003000000000000 \ +# RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=RES_456 +# RES_456: ; error decoding test.kd: kernel descriptor reserved bits in range (456:455) set +# RES_456-NEXT: ; decoding failed region as bytes # RUN: yaml2obj %s -DGPU=GFX90A -DKD=0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000c000000000000 \ # RUN: | llvm-objdump --disassemble-symbols=test.kd - | FileCheck %s --check-prefix=WF32 
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/gfx1250-cu-stores branch from 874e368 to 7738b5e Compare July 25, 2025 08:44
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/cleanup-wait-before-scope-sys-store branch from e4821db to 6c5ec02 Compare July 25, 2025 08:44
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a subtarget feature, not an option

@Pierre-vh Pierre-vh changed the title [AMDGPU][gfx1250] Add cu-store option [AMDGPU][gfx1250] Add cu-store subtarget feature Jul 25, 2025
if (Val)
ImpliedUserSGPRCount += 1;
} else if (ID == ".amdhsa_uses_cu_stores") {
if (!isGFX1250())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is supposed to be a software controlled setting, it probably should be a separate attribute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, only .amdhsa_uses_cu_stores needs to be in the metadata. The intention is that the runtime can check whether the code was built with + or -cu-stores

@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/cleanup-wait-before-scope-sys-store branch 7 times, most recently from ab2bcf0 to 1ab0333 Compare July 28, 2025 10:40
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/gfx1250-cu-stores branch from 7738b5e to 608a6b8 Compare July 28, 2025 11:24
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/cleanup-wait-before-scope-sys-store branch from 1ab0333 to 1e3bb0c Compare July 28, 2025 11:24
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/gfx1250-cu-stores branch from 608a6b8 to 88d8e27 Compare July 28, 2025 12:01
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/cleanup-wait-before-scope-sys-store branch 5 times, most recently from d591d5d to 8d9e7d5 Compare July 28, 2025 13:36
Base automatically changed from users/pierre-vh/cleanup-wait-before-scope-sys-store to main July 28, 2025 13:38
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/gfx1250-cu-stores branch from 88d8e27 to 1ea2ac8 Compare July 28, 2025 13:39
Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/gfx1250-cu-stores branch from 1ea2ac8 to cc1803c Compare July 29, 2025 09:19
@Pierre-vh Pierre-vh merged commit be17791 into main Jul 29, 2025
8 of 10 checks passed
@Pierre-vh Pierre-vh deleted the users/pierre-vh/gfx1250-cu-stores branch July 29, 2025 09:38
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 29, 2025

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/24272

Here is the relevant piece of the build log for the reference
Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure) ... [57/59] Linking CXX executable External/HIP/cmath-hip-6.3.0 [58/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o [59/59] Linking CXX executable External/HIP/TheNextWeek-hip-6.3.0 + build_step 'Testing HIP test-suite' + echo '@@@BUILD_STEP Testing HIP test-suite@@@' + ninja check-hip-simple @@@BUILD_STEP Testing HIP test-suite@@@ [0/1] cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP && /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/llvm-lit -sv array-hip-6.3.0.test empty-hip-6.3.0.test with-fopenmp-hip-6.3.0.test saxpy-hip-6.3.0.test memmove-hip-6.3.0.test split-kernel-args-hip-6.3.0.test builtin-logb-scalbn-hip-6.3.0.test TheNextWeek-hip-6.3.0.test algorithm-hip-6.3.0.test cmath-hip-6.3.0.test complex-hip-6.3.0.test math_h-hip-6.3.0.test new-hip-6.3.0.test blender.test -- Testing: 14 tests, 14 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90 FAIL: test-suite :: External/HIP/blender.test (14 of 14) ******************** TEST 'test-suite :: External/HIP/blender.test' FAILED ******************** /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out --redirect-input /dev/null --summary /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.time /bin/bash test_blender.sh /bin/bash verify_blender.sh /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out Begin Blender test. TEST_SUITE_HIP_ROOT=/opt/botworker/llvm/External/hip Render /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend Blender 4.1.1 (hash e1743a0317bc built 2024-04-15 23:47:45) Read blend: "/opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend" Could not open as Ogawa file from provided streams. Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc Could not open as Ogawa file from provided streams. Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc I0729 09:49:11.138304 1158857 device.cpp:39] HIPEW initialization succeeded I0729 09:49:11.140328 1158857 device.cpp:45] Found HIPCC hipcc I0729 09:49:11.206737 1158857 device.cpp:207] Device has compute preemption or is not used for display. I0729 09:49:11.206804 1158857 device.cpp:211] Added device "" with id "HIP__0000:a3:00". I0729 09:49:11.206898 1158857 device.cpp:568] Mapped host memory limit set to 536,444,985,344 bytes. (499.60G) I0729 09:49:11.207180 1158857 device_impl.cpp:63] Using AVX2 CPU kernels. Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Eyepiece_rim Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.013 Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.016 Fra:1 Mem:524.09M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.019 Fra:1 Mem:524.32M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.020 Fra:1 Mem:524.37M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Cables.004 Fra:1 Mem:526.39M (Peak 526.39M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.022 Fra:1 Mem:526.41M (Peak 526.41M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.023 Fra:1 Mem:526.44M (Peak 526.44M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.024 Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure) @@@BUILD_STEP Testing HIP test-suite@@@ [0/1] cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP && /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/llvm-lit -sv array-hip-6.3.0.test empty-hip-6.3.0.test with-fopenmp-hip-6.3.0.test saxpy-hip-6.3.0.test memmove-hip-6.3.0.test split-kernel-args-hip-6.3.0.test builtin-logb-scalbn-hip-6.3.0.test TheNextWeek-hip-6.3.0.test algorithm-hip-6.3.0.test cmath-hip-6.3.0.test complex-hip-6.3.0.test math_h-hip-6.3.0.test new-hip-6.3.0.test blender.test -- Testing: 14 tests, 14 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90 FAIL: test-suite :: External/HIP/blender.test (14 of 14) ******************** TEST 'test-suite :: External/HIP/blender.test' FAILED ******************** /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out --redirect-input /dev/null --summary /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.time /bin/bash test_blender.sh /bin/bash verify_blender.sh /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out Begin Blender test. TEST_SUITE_HIP_ROOT=/opt/botworker/llvm/External/hip Render /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend Blender 4.1.1 (hash e1743a0317bc built 2024-04-15 23:47:45) Read blend: "/opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend" Could not open as Ogawa file from provided streams. Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc Could not open as Ogawa file from provided streams. Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc I0729 09:49:11.138304 1158857 device.cpp:39] HIPEW initialization succeeded I0729 09:49:11.140328 1158857 device.cpp:45] Found HIPCC hipcc I0729 09:49:11.206737 1158857 device.cpp:207] Device has compute preemption or is not used for display. I0729 09:49:11.206804 1158857 device.cpp:211] Added device "" with id "HIP__0000:a3:00". I0729 09:49:11.206898 1158857 device.cpp:568] Mapped host memory limit set to 536,444,985,344 bytes. (499.60G) I0729 09:49:11.207180 1158857 device_impl.cpp:63] Using AVX2 CPU kernels. Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Eyepiece_rim Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.013 Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.016 Fra:1 Mem:524.09M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.019 Fra:1 Mem:524.32M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.020 Fra:1 Mem:524.37M (Peak 524.70M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Cables.004 Fra:1 Mem:526.39M (Peak 526.39M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.022 Fra:1 Mem:526.41M (Peak 526.41M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.023 Fra:1 Mem:526.44M (Peak 526.44M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.024 Fra:1 Mem:526.45M (Peak 526.45M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.026 Fra:1 Mem:526.61M (Peak 526.62M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.039 Fra:1 Mem:527.03M (Peak 527.03M) | Time:00:00.74 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Hoses.003 Fra:1 Mem:533.13M (Peak 533.13M) | Time:00:00.75 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors Fra:1 Mem:533.72M (Peak 533.72M) | Time:00:00.75 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Wires Fra:1 Mem:533.84M (Peak 533.84M) | Time:00:00.75 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.009 Fra:1 Mem:534.62M (Peak 534.62M) | Time:00:00.75 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.010 
kraj pushed a commit to kraj/llvm-project that referenced this pull request Sep 9, 2025
Pierre-vh added a commit that referenced this pull request Sep 10, 2025
…#157639) This reverts commit be17791. This is not necessary for gfx1250 anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6 participants