Skip to content

Conversation

@SimonSapin
Copy link
Contributor

No description provided.

@SimonSapin SimonSapin added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-iterators Area: Iterators labels Jun 8, 2018
@rust-highfive
Copy link
Contributor

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 8, 2018
@SimonSapin
Copy link
Contributor Author

@scottmcm This PR is from your repo. I’m not good at reading assembly but your comment at #27741 (comment) suggests that this is an improvement. Should we land it?

The diff looks good to me.

@kennytm
Copy link
Member

kennytm commented Jun 8, 2018

I've done a simple benchmark and the test code from the comment is improved from 800µs/iter to 600µs/iter.

running 2 tests test pr_bench ... bench: 592,516 ns/iter (+/- 42,563) test std_bench ... bench: 809,164 ns/iter (+/- 54,219) 

Source code:

#![feature(try_trait, test)] extern crate test; use std::ops::Try; use test::{Bencher, black_box}; #[no_mangle] pub fn std_compute(a: u64, b: u64) -> u64 { StepByWithoutTryFold(StepBy { iter: a..b, step: 6, first_take: true, }).map(|x| x ^ (x - 1)).sum() } #[no_mangle] pub fn pr_compute(a: u64, b: u64) -> u64 { StepBy { iter: a..b, step: 6, first_take: true, }.map(|x| x ^ (x - 1)).sum() } #[bench] fn std_bench(bencher: &mut Bencher) { let a = black_box(1); let b = black_box(5000000); bencher.iter(|| { black_box(std_compute(a, b)); }); } #[bench] fn pr_bench(bencher: &mut Bencher) { let a = black_box(1); let b = black_box(5000000); bencher.iter(|| { black_box(pr_compute(a, b)); }); } struct StepBy<I> { iter: I, step: usize, first_take: bool, } struct StepByWithoutTryFold<I>(StepBy<I>); impl<I: Iterator> Iterator for StepBy<I> { type Item = I::Item; #[inline] fn next(&mut self) -> Option<Self::Item> { if self.first_take { self.first_take = false; self.iter.next() } else { self.iter.nth(self.step) } } #[inline] fn try_fold<B, F, R>(&mut self, init: B, mut f: F) -> R where Self: Sized, F: FnMut(B, Self::Item) -> R, R: Try<Ok=B> { let mut accum = init; if self.first_take { self.first_take = false; if let Some(x) = self.iter.next() { accum = f(accum, x)?; } else { return Try::from_ok(accum); } } while let Some(x) = self.iter.nth(self.step) { accum = f(accum, x)?; } Try::from_ok(accum) } } impl<I: Iterator> Iterator for StepByWithoutTryFold<I> { type Item = I::Item; #[inline] fn next(&mut self) -> Option<Self::Item> { if self.0.first_take { self.0.first_take = false; self.0.iter.next() } else { self.0.iter.nth(self.0.step) } } }
@scottmcm
Copy link
Member

scottmcm commented Jun 9, 2018

Hmm, I'd forgotten I'd started this 😅 I see two options:

  1. Just merge this, since it helps -- thanks for benching, @kennytm! (Well, with some tests, especially for the short-circuiting logic, since that's the thing I got wrong the most doing the other try_folds.)

  2. Reimplement StepBy so that the bool flag isn't needed, and thus the manual try_fold is likely also unneeded (since there's nothing to manually hoist). The flag is there so it can use nth, but if there was a method that called next n times, returning the first result, then the flag wouldn't be needed, and it would actually be easier for Range to implement. Sketch of what I mean: https://play.rust-lang.org/?gist=f2c028e6e42e9ce5f3a311b71dcfc3a9&version=nightly&mode=release

Thoughts?

@kennytm
Copy link
Member

kennytm commented Jun 12, 2018

I think we could do the refactoring later.

@bors r+

@bors
Copy link
Collaborator

bors commented Jun 12, 2018

📌 Commit 2d55d28 has been approved by kennytm

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 12, 2018
@bors
Copy link
Collaborator

bors commented Jun 13, 2018

🔒 Merge conflict

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 13, 2018
@Emerentius
Copy link
Contributor

I've rerun @kennytm's code but replaced the std code by the specialization from my PR referencing this issue just above this comment. With that, this PR would lower performance.

test pr_bench ... bench: 647,483 ns/iter (+/- 12,094) test specialized_pr_bench ... bench: 413,365 ns/iter (+/- 13,795) 
#![feature(try_trait, test, step_trait)] use std::iter::Step; extern crate test; use std::ops::Try; use test::{Bencher, black_box}; #[no_mangle] pub fn std_compute(a: u64, b: u64) -> u64 { StepByWithoutTryFold(StepBy { iter: a..b, step: 6, first_take: true, }).map(|x| x ^ (x - 1)).sum() } #[no_mangle] pub fn pr_compute(a: u64, b: u64) -> u64 { StepBy { iter: a..b, step: 6, first_take: true, }.map(|x| x ^ (x - 1)).sum() } #[bench] fn specialized_pr_bench(bencher: &mut Bencher) { let a = black_box(1); let b = black_box(5000000); bencher.iter(|| { black_box(std_compute(a, b)); }); } #[bench] fn pr_bench(bencher: &mut Bencher) { let a = black_box(1); let b = black_box(5000000); bencher.iter(|| { black_box(pr_compute(a, b)); }); } struct StepBy<I> { iter: I, step: usize, first_take: bool, } struct StepByWithoutTryFold<I>(StepBy<I>); impl<I: Iterator> Iterator for StepBy<I> { type Item = I::Item; #[inline] fn next(&mut self) -> Option<Self::Item> { if self.first_take { self.first_take = false; self.iter.next() } else { self.iter.nth(self.step) } } #[inline] fn try_fold<B, F, R>(&mut self, init: B, mut f: F) -> R where Self: Sized, F: FnMut(B, Self::Item) -> R, R: Try<Ok=B> { let mut accum = init; if self.first_take { self.first_take = false; if let Some(x) = self.iter.next() { accum = f(accum, x)?; } else { return Try::from_ok(accum); } } while let Some(x) = self.iter.nth(self.step) { accum = f(accum, x)?; } Try::from_ok(accum) } } impl<T> Iterator for StepByWithoutTryFold<std::ops::Range<T>> where T: Step, { type Item = T; #[inline] fn next(&mut self) -> Option<Self::Item> { self.0.first_take = false; if self.0.iter.start >= self.0.iter.end { return None; } // add 1 to self.step to get original step size back // it was decremented for the general case on construction if let Some(n) = self.0.iter.start.add_usize(self.0.step+1) { let next = std::mem::replace(&mut self.0.iter.start, n); Some(next) } else { let last = self.0.iter.start.clone(); self.0.iter.start = self.0.iter.end.clone(); Some(last) } } } 
@SimonSapin
Copy link
Contributor Author

Thanks for looking into this @Emerentius. Closing in favor of #51601.

@SimonSapin SimonSapin closed this Jun 18, 2018
bors added a commit that referenced this pull request Jun 21, 2018
Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.
@timvermeulen
Copy link
Contributor

timvermeulen commented May 7, 2019

Since #51601 was reverted, should this be considered again?

Centril added a commit to Centril/rust that referenced this pull request Sep 9, 2019
… r=scottmcm Override `StepBy::{try_fold, try_rfold}` Previous PR: rust-lang#51435 The previous PR was closed in favor of rust-lang#51601, which was later reverted. I don't think these implementations will make it harder to specialize `StepBy<Range<_>>` later, so we should be able to land this without any consequences. This should fix rust-lang#57517 – in my benchmarks `iter` and `iter.step_by(1)` now perform equally well, provided internal iteration is used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-iterators Area: Iterators S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

7 participants