You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Necessary bits for each int type (except BDAs) are being generous and considering a resolution of up to 4k
9
-
// Also, an extra bit is given since it fits: they're uints being passed as ints so we don't litter the code with int32 casts
10
-
// The extra bit is to make sure MSB is a 0 and it doesn't sign extend and give negative numbers when using the maximum amount of considered bits
8
+
// All packed bitfields of int32_t are uints being passed as ints so we don't litter the code with int32 casts
11
9
struct PushConstantData
12
10
{
13
11
// After running FFT along a column, we want to store the result in column major order for coalesced writes, and similarly after running an FFT in row major order
@@ -21,7 +19,10 @@ struct PushConstantData
21
19
// The following three fields being push constants allow dynamic resizing of the image without recompiling shaders (limited by the FFT length)
22
20
int32_t imageRowLength : 16;
23
21
int32_t imageHalfRowLength : 16;
24
-
// Actually only needs at worst 10 bits, but we don't pack it into a bitfield so we can use offsetof and update only this field from CPP side
22
+
// Only middle pass uses these
23
+
uint32_t currentChannel;
24
+
uint64_t channelStartOffsetBytes;
25
+
// We don't pack it into a bitfield so we can use offsetof and update only this field from CPP side
25
26
// Alternatively, we could do the packing/unpacking manually to save 32 bits
26
27
int32_t padding;
27
28
// Used by IFFT to tell if an index belongs to an image or is in the padding
return y * pushConstants.imageRowLength + x; // can no longer sum with | since there's no guarantees on row length
36
38
}
37
-
38
-
// Same as what was used to store in col-major after first axis FFT. This time we launch one workgroup per row so the height of the channel's (half) image is NumWorkgroups,
39
-
// and the width (number of columns) is passed as a push constant
// corresponding to the column they correspond to.
57
51
// The `gl_WorkGroupID().x = 0` case is special because instead of getting the mirror we need to get both zero and nyquist frequencies for the columns, which doesn't happen just by mirror
// Even thread retrieves Zero, odd thread retrieves Nyquist. Zero is always `preloaded[0]` of the previous FFT's 0th thread, while Nyquist is always `preloaded[1]` of that same thread.
116
110
// Therefore we know Nyquist ends up exactly at y = PreviousWorkgroupSize
117
111
const uint32_t y = oddThread ? PreviousWorkgroupSize : 0;
112
+
[unroll]
118
113
for (uint32_t localElementIndex = 0; localElementIndex < ElementsPerInvocation; localElementIndex++)
119
114
{
120
115
int32_t wrappedIndex = paddedIndex < 0 ? ~paddedIndex : paddedIndex; // ~x = - x - 1 in two's complement (except maybe at the borders of representable range)
// Each element on this row is Nabla-ordered. So the element at `x' = index, y' = gl_WorkGroupID().x` that we're operating on is actually the element at
144
139
// `x = F(index), y = bitreverse(gl_WorkGroupID().x)` (with the bitreversal done as an N-1 bit number, for `N = log2(TotalSize)` *of the first axist FFT*)
// Save a row back in row major order. Remember that the first row (one with `gl_WorkGroupID().x == 0`) will actually hold the packed IFFT of Zero and Nyquist rows.
0 commit comments