- Notifications
You must be signed in to change notification settings - Fork 15.2k
Open
Description
LLVM seems to detect a byteswapped load and convert it to load + bswap (x86) / load + rev (armv8). But for some reason, if you take the result and add it to a pointer, it breaks the optimization.
Example:
uint32_t Load32BE(const uint8_t* data) { return (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3]; } const uint8_t* Broken(const uint8_t* data, const uint8_t* base) { return base + Load32BE(data); } const uint8_t* Works(const uint8_t* data, const uint8_t* base) { return reinterpret_cast<const uint8_t*>(reinterpret_cast<size_t>(base) + Load32BE(data)); }In this example, Broken compiles to 4 byte loads and some shifts, while Works compiles to one int load and a bswap/rev