Skip to content

Conversation

utkarshgupta137
Copy link
Contributor

In our (decently large) code base, we use SystemTime::UNIX_EPOCH.elapsed() in a lot of places & often in a loop or in the hot path. On Unix at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet Duration::new() still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case.

Benchmarks:
M1 Pro (14-inch base model):

duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe 

EC2 c7gd.16xlarge (Graviton 3):

duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild 

EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)):

duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe 

Bench code (ran using stable 1.75.0 & criterion latest 0.5.1):
I couldn't find any benches for Duration in this repo, so I just copied the relevant types & recreated it.

use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches);
@rustbot
Copy link
Collaborator

rustbot commented Jan 24, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @m-ou-se (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 24, 2024
@utkarshgupta137 utkarshgupta137 changed the title std/time: avoid divisions in Duration::new core/time: avoid divisions in Duration::new Jan 24, 2024
@quaternic
Copy link
Contributor

It looks like your benchmarks are testing with fixed input values. Since this change is adding a fast path branch to the function, branch prediction is very relevant to how it performs. You should definitely benchmark it with this something where the newly added branch is unpredictable.

@m-ou-se
Copy link
Member

m-ou-se commented Feb 8, 2024

The vast majority of use cases will have nanos < NANOS_PER_SEC, so this makes sense to me.

@bors r+

@bors
Copy link
Collaborator

bors commented Feb 8, 2024

📌 Commit 8a850cd has been approved by m-ou-se

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 8, 2024
@Nadrieril
Copy link
Member

@bors rollup

@utkarshgupta137
Copy link
Contributor Author

Should I also add likely() to the branch, or is that overkill?

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Feb 9, 2024
…ou-se core/time: avoid divisions in Duration::new In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case. Benchmarks: M1 Pro (14-inch base model): ``` duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` EC2 c7gd.16xlarge (Graviton 3): ``` duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild ``` EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)): ``` duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe ``` Bench code (ran using stable 1.75.0 & criterion latest 0.5.1): I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it. ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches); ```
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2024
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120589 (std::thread::available_parallelism merging linux/android/freebsd version) - rust-lang#120596 ([rustdoc] Correctly generate path for non-local items in source code pages) - rust-lang#120672 (std::thread update freebsd stack guard handling.) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120806 (Clippy subtree update) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) r? `@ghost` `@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Feb 9, 2024
…ou-se core/time: avoid divisions in Duration::new In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case. Benchmarks: M1 Pro (14-inch base model): ``` duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` EC2 c7gd.16xlarge (Graviton 3): ``` duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild ``` EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)): ``` duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe ``` Bench code (ran using stable 1.75.0 & criterion latest 0.5.1): I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it. ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches); ```
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2024
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120596 ([rustdoc] Correctly generate path for non-local items in source code pages) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) r? `@ghost` `@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Feb 9, 2024
…ou-se core/time: avoid divisions in Duration::new In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case. Benchmarks: M1 Pro (14-inch base model): ``` duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` EC2 c7gd.16xlarge (Graviton 3): ``` duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild ``` EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)): ``` duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe ``` Bench code (ran using stable 1.75.0 & criterion latest 0.5.1): I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it. ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches); ```
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2024
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#113026 (Introduce `run-make` V2 infrastructure, a `run_make_support` library and port over 2 tests as example) - rust-lang#113671 (Make privacy visitor use types more (instead of HIR)) - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) - rust-lang#120831 (Startup objects disappearing from sysroot) r? `@ghost` `@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2024
…iaskrgr Rollup of 8 pull requests Successful merges: - rust-lang#113671 (Make privacy visitor use types more (instead of HIR)) - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) - rust-lang#120831 (Startup objects disappearing from sysroot) r? `@ghost` `@rustbot` modify labels: rollup
@m-ou-se
Copy link
Member

m-ou-se commented Feb 9, 2024

Should I also add likely() to the branch, or is that overkill?

I'm guessing that is probably overkill.

@bors bors merged commit 8b8adfd into rust-lang:master Feb 9, 2024
@rustbot rustbot added this to the 1.78.0 milestone Feb 9, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2024
Rollup merge of rust-lang#120308 - utkarshgupta137:duration-opt, r=m-ou-se core/time: avoid divisions in Duration::new In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case. Benchmarks: M1 Pro (14-inch base model): ``` duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` EC2 c7gd.16xlarge (Graviton 3): ``` duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild ``` EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)): ``` duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe ``` Bench code (ran using stable 1.75.0 & criterion latest 0.5.1): I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it. ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches); ```
flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 26, 2024
…iaskrgr Rollup of 8 pull requests Successful merges: - rust-lang#113671 (Make privacy visitor use types more (instead of HIR)) - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) - rust-lang#120831 (Startup objects disappearing from sysroot) r? `@ghost` `@rustbot` modify labels: rollup
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Mar 8, 2024
…iaskrgr Rollup of 8 pull requests Successful merges: - rust-lang#113671 (Make privacy visitor use types more (instead of HIR)) - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) - rust-lang#120831 (Startup objects disappearing from sysroot) r? `@ghost` `@rustbot` modify labels: rollup
@utkarshgupta137 utkarshgupta137 deleted the duration-opt branch September 3, 2025 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

6 participants