Skip to content

Conversation

@slinder1
Copy link
Contributor

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Co-authored-by: Scott Linder scott.linder@amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu@amd.com

These spills need special CFI anyway, so implementing them directly where CFI is emitted avoids the need to invent a mechanism to track them from ISel. Co-authored-by: Scott Linder <scott.linder@amd.com> Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>
Copy link
Contributor Author

slinder1 commented Oct 22, 2025

@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Scott Linder (slinder1)

Changes

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Co-authored-by: Scott Linder <scott.linder@amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>


Patch is 153.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164725.diff

8 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+35-10)
  • (modified) llvm/lib/Target/AMDGPU/SIFrameLowering.h (+7)
  • (modified) llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+2-1)
  • (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (+11-2)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+9)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.h (+2)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll (+2556)
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp index bbde3c49f64c6..0c5d1445d36e8 100644 --- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp @@ -671,12 +671,21 @@ void SIFrameLowering::emitEntryFunctionFlatScratchInit( } // Note SGPRSpill stack IDs should only be used for SGPR spilling to VGPRs, not -// memory. They should have been removed by now. -static bool allStackObjectsAreDead(const MachineFrameInfo &MFI) { +// memory. They should have been removed by now, except CFI Saved Reg spills. +static bool allStackObjectsAreDead(const MachineFunction &MF) { + const MachineFrameInfo &MFI = MF.getFrameInfo(); + const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>(); for (int I = MFI.getObjectIndexBegin(), E = MFI.getObjectIndexEnd(); I != E; ++I) { - if (!MFI.isDeadObjectIndex(I)) + if (!MFI.isDeadObjectIndex(I)) { + // determineCalleeSaves() might have added the SGPRSpill stack IDs for + // CFI saves into scratch VGPR, ignore them + if (MFI.getStackID(I) == TargetStackID::SGPRSpill && + FuncInfo->checkIndexInPrologEpilogSGPRSpills(I)) { + continue; + } return false; + } } return true; @@ -696,8 +705,8 @@ Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg( Register ScratchRsrcReg = MFI->getScratchRSrcReg(); - if (!ScratchRsrcReg || (!MRI.isPhysRegUsed(ScratchRsrcReg) && - allStackObjectsAreDead(MF.getFrameInfo()))) + if (!ScratchRsrcReg || + (!MRI.isPhysRegUsed(ScratchRsrcReg) && allStackObjectsAreDead(MF))) return Register(); if (ST.hasSGPRInitBug() || @@ -925,7 +934,7 @@ void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF, bool NeedsFlatScratchInit = MFI->getUserSGPRInfo().hasFlatScratchInit() && (MRI.isPhysRegUsed(AMDGPU::FLAT_SCR) || FrameInfo.hasCalls() || - (!allStackObjectsAreDead(FrameInfo) && ST.enableFlatScratch())); + (!allStackObjectsAreDead(MF) && ST.enableFlatScratch())); if ((NeedsFlatScratchInit || ScratchRsrcReg) && PreloadedScratchWaveOffsetReg && !ST.flatScratchIsArchitected()) { @@ -1306,6 +1315,11 @@ void SIFrameLowering::emitCSRSpillStores(MachineFunction &MF, LiveUnits.addReg(Reg); } } + + // Remove the spill entry created for EXEC. It is needed only for CFISaves in + // the prologue. + if (TRI.isCFISavedRegsSpillEnabled()) + FuncInfo->removePrologEpilogSGPRSpillEntry(TRI.getExec()); } void SIFrameLowering::emitCSRSpillRestores( @@ -1789,14 +1803,14 @@ void SIFrameLowering::processFunctionBeforeFrameFinalized( // can. Any remaining SGPR spills will go to memory, so move them back to the // default stack. bool HaveSGPRToVMemSpill = - FuncInfo->removeDeadFrameIndices(MFI, /*ResetSGPRSpillStackIDs*/ true); + FuncInfo->removeDeadFrameIndices(MF, /*ResetSGPRSpillStackIDs*/ true); assert(allSGPRSpillsAreDead(MF) && "SGPR spill should have been removed in SILowerSGPRSpills"); // FIXME: The other checks should be redundant with allStackObjectsAreDead, // but currently hasNonSpillStackObjects is set only from source // allocas. Stack temps produced from legalization are not counted currently. - if (!allStackObjectsAreDead(MFI)) { + if (!allStackObjectsAreDead(MF)) { assert(RS && "RegScavenger required if spilling"); // Add an emergency spill slot @@ -1896,6 +1910,18 @@ void SIFrameLowering::determinePrologEpilogSGPRSaves( MFI->setSGPRForEXECCopy(AMDGPU::NoRegister); } + if (TRI->isCFISavedRegsSpillEnabled()) { + Register Exec = TRI->getExec(); + assert(!MFI->hasPrologEpilogSGPRSpillEntry(Exec) && + "Re-reserving spill slot for EXEC"); + // FIXME: Machine Copy Propagation currently optimizes away the EXEC copy to + // the scratch as we emit it only in the prolog. This optimization should + // not happen for frame related instructions. Until this is fixed ignore + // copy to scratch SGPR. + getVGPRSpillLaneOrTempRegister(MF, LiveUnits, Exec, RC, + /*IncludeScratchCopy=*/false); + } + // hasFP only knows about stack objects that already exist. We're now // determining the stack slots that will be created, so we have to predict // them. Stack objects force FP usage with calls. @@ -1905,8 +1931,7 @@ void SIFrameLowering::determinePrologEpilogSGPRSaves( // // FIXME: Is this really hasReservedCallFrame? const bool WillHaveFP = - FrameInfo.hasCalls() && - (SavedVGPRs.any() || !allStackObjectsAreDead(FrameInfo)); + FrameInfo.hasCalls() && (SavedVGPRs.any() || !allStackObjectsAreDead(MF)); if (WillHaveFP || hasFP(MF)) { Register FramePtrReg = MFI->getFrameOffsetReg(); diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.h b/llvm/lib/Target/AMDGPU/SIFrameLowering.h index 2b716db0b7a22..526404eb83b4f 100644 --- a/llvm/lib/Target/AMDGPU/SIFrameLowering.h +++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.h @@ -114,6 +114,13 @@ class SIFrameLowering final : public AMDGPUFrameLowering { public: bool requiresStackPointerReference(const MachineFunction &MF) const; + /// If '-amdgpu-spill-cfi-saved-regs' is enabled, emit RA/EXEC spills to + /// a free VGPR (lanes) or memory and corresponding CFI rules. + void emitCFISavedRegSpills(MachineFunction &MF, MachineBasicBlock &MBB, + MachineBasicBlock::iterator MBBI, + LiveRegUnits &LiveRegs, + bool emitSpillsToMem) const; + /// Create a CFI index for CFIInst and build a MachineInstr around it. MachineInstr * buildCFI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, diff --git a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp index 62386da94d854..57ff52334a470 100644 --- a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp +++ b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp @@ -531,7 +531,7 @@ bool SILowerSGPRSpills::run(MachineFunction &MF) { // free frame index ids by the later pass(es) like "stack slot coloring" // which in turn could mess-up with the book keeping of "frame index to VGPR // lane". - FuncInfo->removeDeadFrameIndices(MFI, /*ResetSGPRSpillStackIDs*/ false); + FuncInfo->removeDeadFrameIndices(MF, /*ResetSGPRSpillStackIDs*/ false); MadeChange = true; } diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp index b398db4f7caff..2c275a85440d9 100644 --- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp @@ -566,7 +566,8 @@ bool SIMachineFunctionInfo::allocateVGPRSpillToAGPR(MachineFunction &MF, } bool SIMachineFunctionInfo::removeDeadFrameIndices( - MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) { + MachineFunction &MF, bool ResetSGPRSpillStackIDs) { + MachineFrameInfo &MFI = MF.getFrameInfo(); // Remove dead frame indices from function frame, however keep FP & BP since // spills for them haven't been inserted yet. And also make sure to remove the // frame indices from `SGPRSpillsToVirtualVGPRLanes` data structure, diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h index 2c1a13c345aac..8ca34d10de0ef 100644 --- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h +++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h @@ -752,6 +752,16 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction, }) != PrologEpilogSGPRSpills.end(); } + // Remove if an entry created for \p Reg. + void removePrologEpilogSGPRSpillEntry(Register Reg) { + auto I = find_if(PrologEpilogSGPRSpills, + [&Reg](const auto &Spill) { return Spill.first == Reg; }); + if (I == PrologEpilogSGPRSpills.end()) + return; + + PrologEpilogSGPRSpills.erase(I); + } + const PrologEpilogSGPRSaveRestoreInfo & getPrologEpilogSGPRSaveRestoreInfo(Register Reg) const { const auto *I = find_if(PrologEpilogSGPRSpills, [&Reg](const auto &Spill) { @@ -830,8 +840,7 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction, /// If \p ResetSGPRSpillStackIDs is true, reset the stack ID from sgpr-spill /// to the default stack. - bool removeDeadFrameIndices(MachineFrameInfo &MFI, - bool ResetSGPRSpillStackIDs); + bool removeDeadFrameIndices(MachineFunction &MF, bool ResetSGPRSpillStackIDs); int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI); std::optional<int> getOptionalScavengeFI() const { return ScavengeFI; } diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp index 77608a4cfc751..9677c6cd7806c 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp @@ -35,6 +35,11 @@ static cl::opt<bool> EnableSpillSGPRToVGPR( cl::ReallyHidden, cl::init(true)); +static cl::opt<bool> EnableSpillCFISavedRegs( + "amdgpu-spill-cfi-saved-regs", + cl::desc("Enable spilling the registers required for CFI emission"), + cl::ReallyHidden, cl::init(false), cl::ZeroOrMore); + std::array<std::vector<int16_t>, 32> SIRegisterInfo::RegSplitParts; std::array<std::array<uint16_t, 32>, 9> SIRegisterInfo::SubRegFromChannelTable; @@ -559,6 +564,10 @@ unsigned SIRegisterInfo::getSubRegFromChannel(unsigned Channel, return SubRegFromChannelTable[NumRegIndex - 1][Channel]; } +bool SIRegisterInfo::isCFISavedRegsSpillEnabled() const { + return EnableSpillCFISavedRegs; +} + MCRegister SIRegisterInfo::getAlignedHighSGPRForRC(const MachineFunction &MF, const unsigned Align, diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h index 2dae5f0eb1c69..749583722e3b0 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h @@ -80,6 +80,8 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo { return SpillSGPRToVGPR; } + bool isCFISavedRegsSpillEnabled() const; + /// Return the largest available SGPR aligned to \p Align for the register /// class \p RC. MCRegister getAlignedHighSGPRForRC(const MachineFunction &MF, diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll new file mode 100644 index 0000000000000..c804c75ae7d2c --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll @@ -0,0 +1,2556 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s + +define protected amdgpu_kernel void @kern() #0 { +; CHECK-LABEL: kern: +; CHECK: .Lfunc_begin0: +; CHECK-NEXT: .cfi_sections .debug_frame +; CHECK-NEXT: .cfi_startproc +; CHECK-NEXT: ; %bb.0: ; %entry +; CHECK-NEXT: .cfi_escape 0x0f, 0x04, 0x30, 0x36, 0xe9, 0x02 ; +; CHECK-NEXT: .cfi_undefined 16 +; CHECK-NEXT: s_endpgm +entry: + ret void +} + +define hidden void @func_saved_in_clobbered_vgpr() #0 { +; WAVE64-LABEL: func_saved_in_clobbered_vgpr: +; WAVE64: .Lfunc_begin1: +; WAVE64-NEXT: .cfi_startproc +; WAVE64-NEXT: ; %bb.0: ; %entry +; WAVE64-NEXT: .cfi_llvm_def_aspace_cfa 64, 0, 6 +; WAVE64-NEXT: .cfi_llvm_register_pair 16, 62, 32, 63, 32 +; WAVE64-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; WAVE64-NEXT: s_xor_saveexec_b64 s[4:5], -1 +; WAVE64-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill +; WAVE64-NEXT: .cfi_offset 2560, 0 +; WAVE64-NEXT: s_mov_b64 exec, s[4:5] +; WAVE64-NEXT: v_writelane_b32 v0, exec_lo, 0 +; WAVE64-NEXT: v_writelane_b32 v0, exec_hi, 1 +; WAVE64-NEXT: .cfi_llvm_vector_registers 17, 2560, 0, 32, 2560, 1, 32 +; WAVE64-NEXT: s_xor_saveexec_b64 s[4:5], -1 +; WAVE64-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload +; WAVE64-NEXT: s_mov_b64 exec, s[4:5] +; WAVE64-NEXT: s_waitcnt vmcnt(0) +; WAVE64-NEXT: s_setpc_b64 s[30:31] +; +; WAVE32-LABEL: func_saved_in_clobbered_vgpr: +; WAVE32: .Lfunc_begin1: +; WAVE32-NEXT: .cfi_startproc +; WAVE32-NEXT: ; %bb.0: ; %entry +; WAVE32-NEXT: .cfi_llvm_def_aspace_cfa 64, 0, 6 +; WAVE32-NEXT: .cfi_llvm_register_pair 16, 62, 32, 63, 32 +; WAVE32-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; WAVE32-NEXT: s_xor_saveexec_b32 s4, -1 +; WAVE32-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill +; WAVE32-NEXT: .cfi_offset 1536, 0 +; WAVE32-NEXT: s_waitcnt_depctr 0xffe3 +; WAVE32-NEXT: s_mov_b32 exec_lo, s4 +; WAVE32-NEXT: v_writelane_b32 v0, exec_lo, 0 +; WAVE32-NEXT: .cfi_llvm_vector_registers 1, 1536, 0, 32 +; WAVE32-NEXT: s_xor_saveexec_b32 s4, -1 +; WAVE32-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload +; WAVE32-NEXT: s_waitcnt_depctr 0xffe3 +; WAVE32-NEXT: s_mov_b32 exec_lo, s4 +; WAVE32-NEXT: s_waitcnt vmcnt(0) +; WAVE32-NEXT: s_setpc_b64 s[30:31] +entry: + ret void +} + +; Check that the option causes a CSR VGPR to spill when needed. +define hidden void @func_saved_in_preserved_vgpr() #0 { +; WAVE64-LABEL: func_saved_in_preserved_vgpr: +; WAVE64: .Lfunc_begin2: +; WAVE64-NEXT: .cfi_startproc +; WAVE64-NEXT: ; %bb.0: ; %entry +; WAVE64-NEXT: .cfi_llvm_def_aspace_cfa 64, 0, 6 +; WAVE64-NEXT: .cfi_llvm_register_pair 16, 62, 32, 63, 32 +; WAVE64-NEXT: .cfi_undefined 2560 +; WAVE64-NEXT: .cfi_undefined 2561 +; WAVE64-NEXT: .cfi_undefined 2562 +; WAVE64-NEXT: .cfi_undefined 2563 +; WAVE64-NEXT: .cfi_undefined 2564 +; WAVE64-NEXT: .cfi_undefined 2565 +; WAVE64-NEXT: .cfi_undefined 2566 +; WAVE64-NEXT: .cfi_undefined 2567 +; WAVE64-NEXT: .cfi_undefined 2568 +; WAVE64-NEXT: .cfi_undefined 2569 +; WAVE64-NEXT: .cfi_undefined 2570 +; WAVE64-NEXT: .cfi_undefined 2571 +; WAVE64-NEXT: .cfi_undefined 2572 +; WAVE64-NEXT: .cfi_undefined 2573 +; WAVE64-NEXT: .cfi_undefined 2574 +; WAVE64-NEXT: .cfi_undefined 2575 +; WAVE64-NEXT: .cfi_undefined 2576 +; WAVE64-NEXT: .cfi_undefined 2577 +; WAVE64-NEXT: .cfi_undefined 2578 +; WAVE64-NEXT: .cfi_undefined 2579 +; WAVE64-NEXT: .cfi_undefined 2580 +; WAVE64-NEXT: .cfi_undefined 2581 +; WAVE64-NEXT: .cfi_undefined 2582 +; WAVE64-NEXT: .cfi_undefined 2583 +; WAVE64-NEXT: .cfi_undefined 2584 +; WAVE64-NEXT: .cfi_undefined 2585 +; WAVE64-NEXT: .cfi_undefined 2586 +; WAVE64-NEXT: .cfi_undefined 2587 +; WAVE64-NEXT: .cfi_undefined 2588 +; WAVE64-NEXT: .cfi_undefined 2589 +; WAVE64-NEXT: .cfi_undefined 2590 +; WAVE64-NEXT: .cfi_undefined 2591 +; WAVE64-NEXT: .cfi_undefined 2592 +; WAVE64-NEXT: .cfi_undefined 2593 +; WAVE64-NEXT: .cfi_undefined 2594 +; WAVE64-NEXT: .cfi_undefined 2595 +; WAVE64-NEXT: .cfi_undefined 2596 +; WAVE64-NEXT: .cfi_undefined 2597 +; WAVE64-NEXT: .cfi_undefined 2598 +; WAVE64-NEXT: .cfi_undefined 2599 +; WAVE64-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; WAVE64-NEXT: s_or_saveexec_b64 s[4:5], -1 +; WAVE64-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill +; WAVE64-NEXT: .cfi_offset 2600, 0 +; WAVE64-NEXT: s_mov_b64 exec, s[4:5] +; WAVE64-NEXT: v_writelane_b32 v40, exec_lo, 0 +; WAVE64-NEXT: v_writelane_b32 v40, exec_hi, 1 +; WAVE64-NEXT: .cfi_llvm_vector_registers 17, 2600, 0, 32, 2600, 1, 32 +; WAVE64-NEXT: ;;#ASMSTART +; WAVE64-NEXT: ; clobber nonpreserved VGPRs +; WAVE64-NEXT: ;;#ASMEND +; WAVE64-NEXT: s_or_saveexec_b64 s[4:5], -1 +; WAVE64-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload +; WAVE64-NEXT: s_mov_b64 exec, s[4:5] +; WAVE64-NEXT: s_waitcnt vmcnt(0) +; WAVE64-NEXT: s_setpc_b64 s[30:31] +; +; WAVE32-LABEL: func_saved_in_preserved_vgpr: +; WAVE32: .Lfunc_begin2: +; WAVE32-NEXT: .cfi_startproc +; WAVE32-NEXT: ; %bb.0: ; %entry +; WAVE32-NEXT: .cfi_llvm_def_aspace_cfa 64, 0, 6 +; WAVE32-NEXT: .cfi_llvm_register_pair 16, 62, 32, 63, 32 +; WAVE32-NEXT: .cfi_undefined 1536 +; WAVE32-NEXT: .cfi_undefined 1537 +; WAVE32-NEXT: .cfi_undefined 1538 +; WAVE32-NEXT: .cfi_undefined 1539 +; WAVE32-NEXT: .cfi_undefined 1540 +; WAVE32-NEXT: .cfi_undefined 1541 +; WAVE32-NEXT: .cfi_undefined 1542 +; WAVE32-NEXT: .cfi_undefined 1543 +; WAVE32-NEXT: .cfi_undefined 1544 +; WAVE32-NEXT: .cfi_undefined 1545 +; WAVE32-NEXT: .cfi_undefined 1546 +; WAVE32-NEXT: .cfi_undefined 1547 +; WAVE32-NEXT: .cfi_undefined 1548 +; WAVE32-NEXT: .cfi_undefined 1549 +; WAVE32-NEXT: .cfi_undefined 1550 +; WAVE32-NEXT: .cfi_undefined 1551 +; WAVE32-NEXT: .cfi_undefined 1552 +; WAVE32-NEXT: .cfi_undefined 1553 +; WAVE32-NEXT: .cfi_undefined 1554 +; WAVE32-NEXT: .cfi_undefined 1555 +; WAVE32-NEXT: .cfi_undefined 1556 +; WAVE32-NEXT: .cfi_undefined 1557 +; WAVE32-NEXT: .cfi_undefined 1558 +; WAVE32-NEXT: .cfi_undefined 1559 +; WAVE32-NEXT: .cfi_undefined 1560 +; WAVE32-NEXT: .cfi_undefined 1561 +; WAVE32-NEXT: .cfi_undefined 1562 +; WAVE32-NEXT: .cfi_undefined 1563 +; WAVE32-NEXT: .cfi_undefined 1564 +; WAVE32-NEXT: .cfi_undefined 1565 +; WAVE32-NEXT: .cfi_undefined 1566 +; WAVE32-NEXT: .cfi_undefined 1567 +; WAVE32-NEXT: .cfi_undefined 1568 +; WAVE32-NEXT: .cfi_undefined 1569 +; WAVE32-NEXT: .cfi_undefined 1570 +; WAVE32-NEXT: .cfi_undefined 1571 +; WAVE32-NEXT: .cfi_undefined 1572 +; WAVE32-NEXT: .cfi_undefined 1573 +; WAVE32-NEXT: .cfi_undefined 1574 +; WAVE32-NEXT: .cfi_undefined 1575 +; WAVE32-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; WAVE32-NEXT: s_or_saveexec_b32 s4, -1 +; WAVE32-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill +; WAVE32-NEXT: .cfi_offset 1576, 0 +; WAVE32-NEXT: s_waitcnt_depctr 0xffe3 +; WAVE32-NEXT: s_mov_b32 exec_lo, s4 +; WAVE32-NEXT: v_writelane_b32 v40, exec_lo, 0 +; WAVE32-NEXT: .cfi_llvm_vector_registers 1, 1576, 0, 32 +; WAVE32-NEXT: ;;#ASMSTART +; WAVE32-NEXT: ; clobber nonpreserved VGPRs +; WAVE32-NEXT: ;;#ASMEND +; WAVE32-NEXT: s_or_saveexec_b32 s4, -1 +; WAVE32-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload +; WAVE32-NEXT: s_waitcnt_depctr 0xffe3 +; WAVE32-NEXT: s_mov_b32 exec_lo, s4 +; WAVE32-NEXT: s_waitcnt vmcnt(0) +; WAVE32-NEXT: s_setpc_b64 s[30:31] +entry: + call void asm sideeffect "; clobber nonpreserved VGPRs", + "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9} + ,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19} + ,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29} + ,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"() + ret void +} + +; There's no return here, so the return address live in was deleted. +define void @empty_func() { +; WAVE64-LABEL: empty_func: +; WAVE64: .Lfunc_begin3: +; WAVE64-NEXT: .cfi_startproc +; WAVE64-NEXT: ; %bb.0: +; WAVE64-NEXT: .cfi_llvm_def_aspace_cfa 64, 0, 6 +; WAVE64-NEXT: .cfi_llvm_register_pair 16, 62, 32, 63, 32 +; WAVE64-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; WAVE64-NEXT: s_xor_saveexec_b64 s[4:5], -1 +; WAVE64-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill +; WAVE64-NEXT: .cfi_offset 2560, 0 +; WAVE64-NEXT: s_mov_b64 exec, s[4:5] +; WAVE64-NEXT: v_writelane_b32 v0, exec_lo, 0 +; WAVE64-NEXT: v_writelane_b32 v0, exec_hi, 1 +; +; WAVE32-LABEL: empty_func: +; WAVE32: .Lfunc_begin3: +; WAVE32-NEXT: .cfi_startproc +; WAVE32-NEXT: ; %bb.0: +; WAVE32-NEXT: .cfi_llvm_def_aspace_cfa 64, 0, 6 +; WAVE32-NEXT: .cfi_llvm_register_pair 16, 62, 32, 63, 32 +; WAVE32-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; WAVE32-NEXT: s_xor_saveexec_b32 s4, -1 +; WAVE32-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill +; WAVE32-NEXT: ... [truncated] 
Comment on lines +2 to +3
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -amdgpu-spill-cfi-saved-regs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s

attributes #0 = { nounwind }
attributes #1 = { nounwind "amdgpu-waves-per-eu"="10,10" }
attributes #2 = { nounwind "frame-pointer"="all" "amdgpu-waves-per-eu"="12,12" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12? I thought the limit was 10, at least for gfx9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment