- Notifications
You must be signed in to change notification settings - Fork 803
NFC: Infrastructure changes for DXIL op vector and multi-dim overloads #7259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ✅ With the latest revision this PR passed the C/C++ code formatter. |
4920482 to 96d07b8 Compare Add a new native vector overload type to DXIL intrinsics.
c5d628f to 73a8ca8 Compare 73a8ca8 to 1ac4991 Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly requests for clarification, but a few comments that I think might identify areas where bad input might lead to bad output.
I'm curious if you were tempted to use the new extended mechanism for all intrinsics. The array of bools is too long to be easily readable and I think a bitfield would be no worse and more concise giving us one way of indicating supported parameter types instead of two. If we were to just allow extended types to have a single entry, might it "just work"? By that I mean essentially removing the restriction against "fd," and preventing the unneeded dot separator might get us 90% there? Naturally we wouldn't want to require the comma in all cases, but it seems that would get us where I'm thinking of under the new code if it were allowed.
include/dxc/DXIL/DxilOperations.h Outdated
| | ||
| // Extended Type Overloads: | ||
| // This is an encoding for a multi-dimensional overload. | ||
| // 1. Only bAllowOverload[kExtendedTypeSlot] is set to true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what these numbers indicate, but this one is most confusing to me as it seems that bAllowedOverload[kVectorTypeSlot] being set to true is also an option?
I think we are describing two possibilities here:
- we are using extended types:
- only
bAllowedOverload[kExtenededTypeSlot]is true. ExtendedOverloadscomes into play
- we are using at least one vector type:
bAllowedOverload[kExtenededTypeSlot]is true but others can be also- AllowedVectorElements comes into play
Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite.
There are basically two overload modes: 1. one overload dimension, and 2. extended overload dimensions.
These modes are orthogonal to whether or not vectors are allowed.
bAllowedOverload[kExtenededTypeSlot] tells you which mode you're in. If it's false, you're in 1. one overload dimension. If it's true, you're in 2. extended overload dimensions.
When in 1. one overload dimension:
- bAllowedOverload[kExtenededTypeSlot] is false (that's how you know)
- the
ExtendedOverloadsare unused and should be zero - if
bAllowedOverload[kVectorTypeSlot]is false:- no changes to overload handling
AllowedVectorElementsis unused/zero
- if
bAllowedOverload[kVectorTypeSlot]is true, then:- Other bits in
bAllowedOverloadmay still be set to describe legal scalar overloads. AllowedVectorElements[0]describes the legal vector element overloads.
- Other bits in
When in 2. extended overload dimensions:
- bAllowedOverload[kExtenededTypeSlot] is true (that's how you know)
- all other bools in
bAllowedOverloadmust be false - each overload dimension is described by an array element of
ExtendedOverloadsinstead of usingbAllowedOverload - for each dimension
n:ExtendedOverloads[n]is used for scalar overloads and a vector bit to indicate vector overload supportAllowedVectorElements[n]encodes the element overloads whenExtendedOverloads[n][kVectorTypeSlot]is set, otherwise it should be zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been heavily reworked.
I've eliminated the bAllowedOverload array in favor of a NumOverloadDims and use of the bit masks.
Recommend reviewing separate commits, and see at the commit comment for the main refactor.
| DXASSERT(IsDxilOpExtendedOverload(opCode), | ||
| "otherwise, Dxil Op does not support extended overload"); | ||
| return GetOpFunc(opCode, GetExtendedOverloadType(ExtendedOverloads)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To potentially clarify for others. This isn't used yet, but will be used by autogenerated code when extended overloads are introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be used by incoming built-in linalg lowering code in HLOperationLower, but not technically by any "autogenerated" code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I know what you meant by autogenerated code, but was confused by "used by". Rather, this will lead to extracting elements in an upcoming autogenerated code path for new DXIL ops.
| Some minor notes about the description:
Given that this line presents this as a two-pronged change, I think a section describing the vector parts and one for multi-dimensional overloads might be helpful to orient readers about what is being added before getting into implementation details.. In particular because I think an explanation of what exactly multi-dimension overloads are might be helpful.
Since the change using 't' never made it into main, I think it can be left unmentioned.
I think these might be clearer if there's a bullet above them saying something like "Introduces the '<' character that, when placed after the scalar overloads, indicates that vectors overloads are allowed" This might replace the comment about changing 't' to '<'.
Since this isn't visible to users of hctdb.py, I think the second note is more important. The 'x' not being directly accessible to hctdb.py entries might bear mentioning.
|
| By way of guidance for other reviewers, it would be most convenient to mostly ignore the last commit as it only contains generated code. Worth glancing at to see what is added, but it really slows down the diff and causes DxilOperations.cpp to be closed by default and the more meaningful additions to get lost in the noise. I reviewed this mostly: https://github.com/microsoft/DirectXShaderCompiler/pull/7259/files/f538ebdc392bca7cb3e95752c0fdd8470249b5f5 |
I noticed these functions when adding multi-overload, and meant to delete them. In fact, I had deleted them from the cpp, but then my change was wiped out by a failed hctgen run. I think these were leftover from an attempt to work around a name collision ultimately caused by dx.types.CBufRet.f16|i16 between min-precision (with 4 elements) and native low precision (with 8 elements), caused by failing to initialize the min precision mode correctly for linking. That issue was fixed, and the names made unique by adding ".8" to the end of the 8-element native low precision cbuffer return type.
- Use Twine, raw_svector_ostream, and SmallVector storage to replace uses of std::string - Use SmallVector instead of std::vector for ArgTypes in GetOpFunc
- Functions IsDxilOpType and IsDxilOpTypeName are unused, this removes them.
| ✅ With the latest revision this PR passed the Python code formatter. |
da1c4ab to dccb5d7 Compare Add comments explaining the new system. Eliminate bool array in favor of array of masks for up to N dimensions. Add NumOverloadDims instead of two-mode system. Rework TypeSlots: - use enum, categorize basic, limit masks to used bits - void doesn't need a type slot (NumOverloadDims == 0 instead) - m_OverloadTypeName only contains basic type names Handle multi-overload in FixOverloadNames; new MayHaveNonCanonicalOverload is used to determine whether the overload name could need fixing. Extended overload is still a distinction because of the way the overloads must be wrapped in an unnamed StructType. However, it does not need a bit in the overload mask. Renamed GetVectorType to GetStructVectorType, since it's just used to get a struct for a particular vector type, not a vector type itself. In hctdb.py, no longer separate extended and vector overloads, just verify correctness of the incoming string, and add default vector overloads if necessary. In hctdb_instrhelp.py, update according to changes in hctdb.py, and eliminate needless, problematic, outdated comment printing.
dccb5d7 to ad784af Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I have any feedback that requires a change, but I did make a few small suggestions.
| return str; | ||
| raw_svector_ostream OS(Storage); | ||
| Ty->print(OS); | ||
| return OS.str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with it, but I'm curious what advantages the change from std::string to SmallVector provides. The print to string mechanism is nice, but was possible before.
I also wonder about the twining for vector types compared to the print to string for extended types in a similar scenario. Again, I have no clear preference here, just wondering at the discrepancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be more efficient and require no heap allocations 99% of the time.
Regarding the use of Twine, that should be maximally efficient, as it will know the final size in advance of any allocation into the final buffer (which shouldn't be necessary) rather than incrementally adding content and checking if a heap allocation and copy is needed.
But Twine is most useful for a single expression that results in a single concatenation into the final string buffer and return of a StringRef. Anything more complex and you might as well use the raw_svector_ostream.
| return false; | ||
| if (opCode == OpCode::NumOpCodes) | ||
| if (static_cast<unsigned>(opCode) >= | ||
| static_cast<unsigned>(OpCode::NumOpCodes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do these need to be casted to integers to be compared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not, but I'm concerned that an unsanitized opCode value could be supplied which may be greater than OpCode::NumOpCodes. If there's an assumption that a value of type OpCode will only ever be set to one of the values defined in the enum, I'm concerned that an invalid opcode value will not be caught.
| llvm::Type *GetCBufferRetType(llvm::Type *pOverloadType); | ||
| llvm::Type *GetVectorType(unsigned numElements, llvm::Type *pOverloadType); | ||
| llvm::Type *GetStructVectorType(unsigned numElements, | ||
| llvm::Type *pOverloadType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this renaming to clarify it's not a native vector type, but a struct for unpacked vectors. I considered suggesting adding the unpacked ness to it, but it would get kind of wordy and confusing as GetUnpackedStructVectorType
utils/hct/hctdb_instrhelp.py Outdated
| if last_category != None: | ||
| print("") | ||
| print( | ||
| " // {category:118} void, h, f, d, i1, i8, i16, i32, i64, udt, obj, vec, function attribute, ext oload, vec oload".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I hope it was clear that my previous comment wasn't demanding this change, I won't miss the giant boolean matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nits, looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks!
| [celebrate] Damyan Pepper reacted to your message: …________________________________ From: Tex Riddell ***@***.***> Sent: Tuesday, April 1, 2025 6:56:40 PM To: microsoft/DirectXShaderCompiler ***@***.***> Cc: Damyan Pepper ***@***.***>; Review requested ***@***.***> Subject: Re: [microsoft/DirectXShaderCompiler] NFC: Infrastructure changes for DXIL op vector and multi-dim overloads (PR #7259) Merged #7259<#7259> into main. — Reply to this email directly, view it on GitHub<#7259 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB56BAXBGGB5CC4L64RWCXT2XLOORAVCNFSM6AAAAABZY3MUMKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJXGA4DAOJYGMYDMOI>. You are receiving this because your review was requested.Message ID: ***@***.***> |
This change adds vector and multi-dimensional overload support for DXIL operations.
Multi-dimensional (or "extended") overloads are added, where two or more types in a DXIL Op function signature may vary independently, such as both the return type and a parameter type. Until now, only one overload dimension has been necessary.
For single-dim overloads, any number of parameters in a DXIL op may refer to this single overload type.
For multi-dim overloads, each type that can vary must have a unique overload dimension, even when two or more types must be the same. This follows a pattern from llvm intrinsics. If two or more of the types need to be the same, this constraint must be handled manually, outside the automatic overload constraints defined by the DXIL op definitions.
Vector overloads are also added, requiring an additional set of scalar overload types to define the allowed vector element types, on top of the original set describing the allowed scalar overloads for an operation, since both scalar and vector overloads may be allowed on the same operation.
There are several components involved in handling DXIL operation overloads, with some changes:
hctdb.pyuse a string of characters to define the allowed overloads, and special type names used in parameter definitions that refer to the overload type.','may separate dimensions for multi-dim overloads'<'indicates that a vector overload is allowed, in which case, scalar components on the left indicate normal scalar overloads allowed, and scalar components on the right indicate the allowed vector element overloads."hf<"is equivalent to"hf<hf".dxil_max_overload_dims = 2is introduced to define the maximum number of overload dimensions currently supported.DXIL::kDxilMaxOloadDimsdefinition inDxilConstants.h."$x0"and"$x1"are used to reference each overloaded dxil type in parameter definitions when more than one overload dimension is defined for a DXIL op. Other special overload types are not allowed for multi-dim overloads, which means you cannot (currently) describe a multi-dim overload where a returned overload type is wrapped in a resource return struct along with residency status. This could be changed in the future if necessary.hctdb_instrhelp.pytranslates overload and param type info from DXIL operation definitions into code inserted intoDxilOperations.cpp.DxilOperations.h|cppencodes allowed overloads insideOpCodePropertystate for each operation in them_OpCodePropstable. It uses this information, along with generated code, to enforce overload rules on DXIL ops.OpCodePropertyhas been rewritten to use a more compactOverloadMasktype, support multi-dim overloads, and add a second layerAllowedVectorElementsfor vector overloads for each dimension.llvm::Type*describes the overload type, such as with:GetOpFunc,GetOpFuncList,GetOverloadType,IsOverloadLegal, andm_OpCodeClassCache. The scheme used for multi-dim overloads is to encode each of the overload types in a single unnamedStructType, liketype {i32, <4 x float>}. This makes it compatible with all these existing mechanisms without requiring an API overhaul impacting the broader code base.GetExtendedOverloadTypeis used to construct this type from multiple types.While updating
DxilOperations.h|cpp, I noticed and removed some unused methods:IsDxilOpTypeName,IsDxilOpType,IsDupDxilOpType,GetOriginalDxilOpType.