- Notifications
You must be signed in to change notification settings - Fork 15.2k
Open
Description
Since shuffles only change the arrangement of elements within a vector, it should be legal to reassociate operations that transform all lanes in the same manner to occur before/after they are shuffled even if the shuffle pattern isn't known at compile time.
This optimization isn't performed when the indexes are variable. For example, shuffle(POW_OF_2_LUT, idx) + 1 is implemented naively rather then offsetting the table:
powOf2P1LUT_clang: pand xmm0, xmmword ptr [rip + .LCPI0_0] movdqa xmm1, xmmword ptr [rip + .LCPI0_1] pshufb xmm1, xmm0 pcmpeqd xmm0, xmm0 psubb xmm1, xmm0 movdqa xmm0, xmm1 rethttps://godbolt.org/z/Y78Ws6Tdv
Note that this isn't legal for shuffles that can set lanes to specific constants, or that select between two vectors if only one is transformed.