Skip to content

Conversation

@krishnamtibrewala
Copy link
Collaborator

This will help in Liner code. Since reg-to-reg copy goes only on one slot by using multi-slot in case of const copy we can bundle them or use different slot for better packing.
PS : Very small optimization

@krishnamtibrewala
Copy link
Collaborator Author

Note : The best place to do this would be after register coalescing, that is where const COPY are created, and we should do this before RA

@martien-de-jong
Copy link
Collaborator

How is this different from constant rematerialization by RA?

@krishnamtibrewala
Copy link
Collaborator Author

How is this different from constant rematerialization by RA?

Hi @martien-de-jong, i do not see that happening in RA, additionally the idea here is to covert the Scalar COPY instruction into PseudoImm move so that we can be packed in a same VLIW bundle.

@martien-de-jong
Copy link
Collaborator

What is stopping you from implementing this before RA? Also, I think that register coalescing removes copies rather than creating them. I think that PHI elimination is the biggest creator of copies.

@krishnamtibrewala
Copy link
Collaborator Author

What is stopping you from implementing this before RA? Also, I think that register coalescing removes copies rather than creating them. I think that PHI elimination is the biggest creator of copies.

Hi @martien-de-jong you are right PHI is the biggest creator of COPY, the register coalescing pass helps to clean up these copies to a certain extend leading to IR like. (Note: When it comes to COPY from a unmatching sub-reg to sub-reg, coalescing pass does not do a great job for us)

bb.0
%1 = mov_imm_pseudo 0
%2 = COPY %1
%3 = ADD %1, %x

bb.1
%4 = COPY %1

The only motivation to implement it before RA is the live range of %1 might reduce, aiding it in RA (both are big IFs)
The current implementation was more from ease of implementation & show a working PoC by using copyPhysReg(...) to pick a mov_imm_pseudo rather than mov_scl when possible, helping scheduler to do better bundling.

I saw this helpful in Conv2D_bfp16_* test cases.

@martien-de-jong
Copy link
Collaborator

@krishnamtibrewala yes, everything is related. There are more liverange considerations around REQ_SEQ and subreg use, especially across PHI nodes. I have a feeling that a combined PHI-elimination + constant materialization + register coalescing might be quite powerful. (although rematerialization might be reserved as a repair mechanism in core RA. It might influence coalescing decisions though.)

mgehre-amd pushed a commit that referenced this pull request Aug 21, 2025
[AutoBump] Merge with 8a9921f (Oct 23) (17)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants