Codegen weirdness for sum of count_ones over an array

@wesleywiser

(Issue loosely owned by @wesleywiser and @pnkfelix monitoring llvm/llvm-project#57476 )

Original Description below

pub fn f(arr: [u64; 2]) -> u32 { arr.into_iter().map(u64::count_ones).sum() }

Before 1.62.0, this code correctly compiled to two popcounts and an addition on a modern x86-64 target.

example::f:  popcnt rcx, qword ptr [rdi]  popcnt rax, qword ptr [rdi + 8]  add eax, ecx  ret

Since 1.62.0 (up to latest nightly), the codegen is... baffling at best.

.LCPI0_0:  .zero 16,15 .LCPI0_1:  .byte 0  .byte 1  .byte 1  .byte 2  .byte 1  .byte 2  .byte 2  .byte 3  .byte 1  .byte 2  .byte 2  .byte 3  .byte 2  .byte 3  .byte 3  .byte 4 example::f:  sub rsp, 40  vmovups xmm0, xmmword ptr [rdi]  vmovdqa xmm1, xmmword ptr [rip + .LCPI0_0]  vmovdqa xmm3, xmmword ptr [rip + .LCPI0_1]  vmovaps xmmword ptr [rsp], xmm0  vmovdqa xmm0, xmmword ptr [rsp]  vpand xmm2, xmm0, xmm1  vpsrlw xmm0, xmm0, 4  vpand xmm0, xmm0, xmm1  vpshufb xmm2, xmm3, xmm2  vpxor xmm1, xmm1, xmm1  vpshufb xmm0, xmm3, xmm0  vpaddb xmm0, xmm0, xmm2  vpsadbw xmm0, xmm0, xmm1  vpshufd xmm1, xmm0, 170  vpaddd xmm0, xmm0, xmm1  vmovd eax, xmm0  add rsp, 40  ret

The assembly for the original function is now a terribly misguided autovectorization. And, just to make sure (even though it's pretty obvious), I did run a benchmark - the autovectorized function is ~8x slower on my Zen 2 system.

Calling that function from a different function brings back normal assembly. -Cno-vectorize-slp does nothing. I don't know exactly what -Cno-vectorize-loops does, but it's not good.

If you change the length of the array to 4, both functions get autovectorized. -Cno-vectorize-slp fixes the second function now. Adding -Cno-vectorize-loops causes the passthrough function to generate the worst assembly.

Changing into_iter to iter fixes length 2, but doesn't fix length 4.

I could go on, but in short it's a whole mess.

I found a workaround that consistently works for all lengths: iter and -Cno-vectorize-slp.

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged +A-array +A-codegen +A-iterators +A-LLVM +A-simd +I-slow +O-x86_64 +perf-regression

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Codegen weirdness for `sum` of `count_ones` over an array #101060

Original Description below

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Codegen weirdness for sum of count_ones over an array #101060

Description

Original Description below

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Codegen weirdness for `sum` of `count_ones` over an array #101060