-
- Notifications
You must be signed in to change notification settings - Fork 14.3k
Closed
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Description
The following snippet generates branches (with -C opt-level=3)
pub fn func1(a: u16, b: u16, v: u16) -> u16 { match (a == v, b == v) { (true, false) => 0, (false, true) => u16::MAX, _ => 1 << 15, // half } }whereas spelling it out to the compiler does not:
pub fn func2(a: u16, b: u16, v: u16) -> u16 { match (a == v, b == v) { (true, false) => 0, (false, true) => u16::MAX, (true, true) => 1 << 15, // half (false, false) => 1 << 15, // half } }I believe this breaks the zero cost abstraction promise.
In theory these do mean different things but for constant 16-bit integers on a modern 64-bit system they should be optimized away.
Other quirks:
- only partially spelling it out to the compiler with
(true, true) | (false, false)still generates branches - using guard clauses generates branchless code but with more instructions
Godbolt link: https://rust.godbolt.org/z/cWoboKM3d
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.