Skip to content

Just introducing an intermediate variable speeds up code by ~4% #146888

@yyaHo

Description

@yyaHo

I tried this code:

https://www.godbolt.org/z/exEqG7r1n

use std::ops::Index; struct AssociationList<K, V> { pairs: Vec<AssociationPair<K, V>>, } #[derive(Clone)] struct AssociationPair<K, V> { key: K, value: V, } impl<K, V> AssociationList<K, V> { fn push(&mut self, key: K, value: V) { self.pairs.push(AssociationPair { key: key, value: value, }); } } impl<'a, K: PartialEq + std::fmt::Debug, V: Clone> Index<&'a K> for AssociationList<K, V> { type Output = V; fn index(&self, index: &K) -> &V { for pair in &self.pairs { if pair.key == *index { return &pair.value; } } panic!("No value found for key: {:?}", index); } } pub fn main() { let foo = std::rc::Rc::new("foo".to_string()); let bar = "bar".to_string(); let mut list = AssociationList { pairs: Vec::new() }; list.push((*foo).clone(), 22); list.push(bar.clone(), 44); assert_eq!(list[&(*foo)], 22); assert_eq!(list[&bar], 44); assert_eq!(list[&(*foo)], 22); assert_eq!(list[&bar], 44); } 

and:

use std::ops::Index; struct AssociationList<K, V> { pairs: Vec<AssociationPair<K, V>>, } #[derive(Clone)] struct AssociationPair<K, V> { key: K, value: V, } impl<K, V> AssociationList<K, V> { fn push(&mut self, key: K, value: V) { self.pairs.push(AssociationPair { key: key, value: value, }); } } impl<'a, K: PartialEq + std::fmt::Debug, V: Clone> Index<&'a K> for AssociationList<K, V> { type Output = V; fn index(&self, index: &K) -> &V { for pair in &self.pairs { if pair.key == *index { return &pair.value; } } panic!("No value found for key: {:?}", index); } } pub fn main() { let foo = std::rc::Rc::new("foo".to_string()); let bar = "bar".to_string(); let mut list = AssociationList { pairs: Vec::new() }; list.push((*foo).clone(), 22); let temp_bridge_4831 = bar; list.push(temp_bridge_4831.clone(), 44); assert_eq!(list[&(*foo)], 22); assert_eq!(list[&temp_bridge_4831], 44); assert_eq!(list[&(*foo)], 22); assert_eq!(list[&temp_bridge_4831], 44); } 

I expected to see this happen:

Both code versions only differ in that the second one introduces an additional binding (let temp_bridge_4831 = bar;). I expected the generated assembly code and runtime performance to remain essentially the same.

Instead, this happened:

When benchmarked with hyperfine (the main function loop with 10^7 iterations), the version with the extra binding was about 4.4% faster.

Although this performance difference is not large, in Rust, using intermediate variables or transferring ownership is very common. Therefore, one may not expect such a small change to cause a noticeable difference in a high-performance scenario.

Could introducing the intermediate variable trigger some kind of unexpected optimization, which might be related to the Index trait implementation, the way indexing works, or some other subtle effect?

Meta

rustc --version --verbose:

rustc 1.92.0-nightly (9f32ccf35 2025-09-21) binary: rustc commit-hash: 9f32ccf35fb877270bc44a86a126440f04d676d0 commit-date: 2025-09-21 host: x86_64-unknown-linux-gnu release: 1.92.0-nightly LLVM version: 21.1.1 
hyperfine.json (no extra binding)

{ "results": [ { "command": "./target/release/overloaded-index-assoc-list", "mean": 1.0572985350399997, "stddev": 0.004401112717279841, "median": 1.05744619174, "user": 1.0555616399999999, "system": 0.001539378, "min": 1.05054995624, "max": 1.06405305924, "times": [ 1.0512718642399999, 1.0587477022399998, 1.05715102424, 1.0553098372399998, 1.05774135924, 1.06405305924, 1.0553158062399999, 1.06256787324, 1.06027686824, 1.05054995624 ], "exit_codes": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] } ] } 

hyperfine.json (with extra binding)

{ "results": [ { "command": "./target/release/overloaded-index-assoc-list_new", "mean": 1.01132581302, "stddev": 0.009323974797432985, "median": 1.01071172272, "user": 1.0098468400000002, "system": 0.0014089679999999998, "min": 0.9957634287200001, "max": 1.02754362272, "times": [ 1.01354617872, 1.02754362272, 1.00658507772, 1.0096771927200001, 1.02160711572, 0.9957634287200001, 1.01174625272, 1.00343215972, 1.00540717572, 1.01794992572 ], "exit_codes": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] } ] } 

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions