Skip to content

Missed optimization: multiple instances of a small struct don't reuse the stack allocation #141649

@ohadravid

Description

@ohadravid

When creating multiple instances of a small struct, each instance will be allocated separately on the stack even if they are known never to overlap.

Example: the following code will generate two alloca calls that are not optimized away by LLVM:

(Godbolt)

pub struct WithOffset<T> { pub data: T, pub offset: usize, } #[inline(never)] pub fn use_w(w: WithOffset<&[u8; 16]>) { std::hint::black_box(w); } #[inline(never)] pub fn peek_w(w: &WithOffset<&[u8; 16]>) { std::hint::black_box(w); } pub fn offsets(buf: [u8; 16]) { let w = WithOffset { data: &buf, offset: 0, }; peek_w(&w); use_w(w); let w2 = WithOffset { data: &buf, offset: 1, }; peek_w(&w2); use_w(w2); }

LLVM IR:

; playground::offsets ; Function Attrs: noinline nounwind define internal fastcc void @playground::offsets(ptr noalias nocapture noundef nonnull readonly align 1 dereferenceable(16) %buf) unnamed_addr #0 { start: %w2 = alloca [16 x i8], align 8 %w = alloca [16 x i8], align 8 store ptr %buf, ptr %w, align 8 %0 = getelementptr inbounds nuw i8, ptr %w, i64 8 store i64 0, ptr %0, align 8 ; call playground::peek_w call fastcc void @playground::peek_w(ptr noalias noundef readonly align 8 dereferenceable(16) %w) #88 ; call playground::use_w call fastcc void @playground::use_w(ptr noalias noundef readonly align 1 dereferenceable(16) %buf, i64 noundef 0) #88 store ptr %buf, ptr %w2, align 8 %1 = getelementptr inbounds nuw i8, ptr %w2, i64 8 store i64 1, ptr %1, align 8 ; call playground::peek_w call fastcc void @playground::peek_w(ptr noalias noundef readonly align 8 dereferenceable(16) %w2) #88 ; call playground::use_w call fastcc void @playground::use_w(ptr noalias noundef readonly align 1 dereferenceable(16) %buf, i64 noundef 1) #88 ret void }

It seems like a call to @llvm.lifetime.{start,end}.p0 is missing. If we instead use:

pub fn closures(buf: [u8; 16]) { (|| { let w = WithOffset { data: &buf, offset: 0, }; peek_w(&w); use_w(w); })(); (|| { let w2 = WithOffset { data: &buf, offset: 1, }; peek_w(&w2); use_w(w2); })(); }

We do get them and the second alloca is optimized away (see the Godbolt link).

I encountered this when working on memorysafety/rav1d#1402, where this misoptimization results in over 100 bytes of extra allocations in a specific function, which slows down the entire binary by ~0.5%.

This might also be related to #138544

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationA-mir-optArea: MIR optimizationsC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-opsemRelevant to the opsem team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions