Override try_fold for StepBy #51435

SimonSapin · 2018-06-08T14:21:58Z

No description provided.

rust-highfive · 2018-06-08T14:22:02Z

(rust_highfive has picked a reviewer for you, use r? to override)

SimonSapin · 2018-06-08T14:24:06Z

@scottmcm This PR is from your repo. I’m not good at reading assembly but your comment at #27741 (comment) suggests that this is an improvement. Should we land it?

The diff looks good to me.

kennytm · 2018-06-08T14:53:27Z

I've done a simple benchmark and the test code from the comment is improved from 800µs/iter to 600µs/iter.

running 2 tests test pr_bench ... bench: 592,516 ns/iter (+/- 42,563) test std_bench ... bench: 809,164 ns/iter (+/- 54,219)

Source code:

#![feature(try_trait, test)] extern crate test; use std::ops::Try; use test::{Bencher, black_box}; #[no_mangle] pub fn std_compute(a: u64, b: u64) -> u64 { StepByWithoutTryFold(StepBy { iter: a..b, step: 6, first_take: true, }).map(|x| x ^ (x - 1)).sum() } #[no_mangle] pub fn pr_compute(a: u64, b: u64) -> u64 { StepBy { iter: a..b, step: 6, first_take: true, }.map(|x| x ^ (x - 1)).sum() } #[bench] fn std_bench(bencher: &mut Bencher) { let a = black_box(1); let b = black_box(5000000); bencher.iter(|| { black_box(std_compute(a, b)); }); } #[bench] fn pr_bench(bencher: &mut Bencher) { let a = black_box(1); let b = black_box(5000000); bencher.iter(|| { black_box(pr_compute(a, b)); }); } struct StepBy<I> { iter: I, step: usize, first_take: bool, } struct StepByWithoutTryFold<I>(StepBy<I>); impl<I: Iterator> Iterator for StepBy<I> { type Item = I::Item; #[inline] fn next(&mut self) -> Option<Self::Item> { if self.first_take { self.first_take = false; self.iter.next() } else { self.iter.nth(self.step) } } #[inline] fn try_fold<B, F, R>(&mut self, init: B, mut f: F) -> R where Self: Sized, F: FnMut(B, Self::Item) -> R, R: Try<Ok=B> { let mut accum = init; if self.first_take { self.first_take = false; if let Some(x) = self.iter.next() { accum = f(accum, x)?; } else { return Try::from_ok(accum); } } while let Some(x) = self.iter.nth(self.step) { accum = f(accum, x)?; } Try::from_ok(accum) } } impl<I: Iterator> Iterator for StepByWithoutTryFold<I> { type Item = I::Item; #[inline] fn next(&mut self) -> Option<Self::Item> { if self.0.first_take { self.0.first_take = false; self.0.iter.next() } else { self.0.iter.nth(self.0.step) } } }

scottmcm · 2018-06-09T22:50:40Z

Hmm, I'd forgotten I'd started this 😅 I see two options:

Just merge this, since it helps -- thanks for benching, @kennytm! (Well, with some tests, especially for the short-circuiting logic, since that's the thing I got wrong the most doing the other try_folds.)
Reimplement StepBy so that the bool flag isn't needed, and thus the manual try_fold is likely also unneeded (since there's nothing to manually hoist). The flag is there so it can use nth, but if there was a method that called next n times, returning the first result, then the flag wouldn't be needed, and it would actually be easier for Range to implement. Sketch of what I mean: https://play.rust-lang.org/?gist=f2c028e6e42e9ce5f3a311b71dcfc3a9&version=nightly&mode=release

Thoughts?

kennytm · 2018-06-12T21:09:02Z

I think we could do the refactoring later.

@bors r+

bors · 2018-06-12T21:09:03Z

📌 Commit 2d55d28 has been approved by kennytm

bors · 2018-06-13T02:26:55Z

🔒 Merge conflict

Emerentius · 2018-06-17T23:05:12Z

I've rerun @kennytm's code but replaced the std code by the specialization from my PR referencing this issue just above this comment. With that, this PR would lower performance.

test pr_bench ... bench: 647,483 ns/iter (+/- 12,094) test specialized_pr_bench ... bench: 413,365 ns/iter (+/- 13,795)

#![feature(try_trait, test, step_trait)] use std::iter::Step; extern crate test; use std::ops::Try; use test::{Bencher, black_box}; #[no_mangle] pub fn std_compute(a: u64, b: u64) -> u64 { StepByWithoutTryFold(StepBy { iter: a..b, step: 6, first_take: true, }).map(|x| x ^ (x - 1)).sum() } #[no_mangle] pub fn pr_compute(a: u64, b: u64) -> u64 { StepBy { iter: a..b, step: 6, first_take: true, }.map(|x| x ^ (x - 1)).sum() } #[bench] fn specialized_pr_bench(bencher: &mut Bencher) { let a = black_box(1); let b = black_box(5000000); bencher.iter(|| { black_box(std_compute(a, b)); }); } #[bench] fn pr_bench(bencher: &mut Bencher) { let a = black_box(1); let b = black_box(5000000); bencher.iter(|| { black_box(pr_compute(a, b)); }); } struct StepBy<I> { iter: I, step: usize, first_take: bool, } struct StepByWithoutTryFold<I>(StepBy<I>); impl<I: Iterator> Iterator for StepBy<I> { type Item = I::Item; #[inline] fn next(&mut self) -> Option<Self::Item> { if self.first_take { self.first_take = false; self.iter.next() } else { self.iter.nth(self.step) } } #[inline] fn try_fold<B, F, R>(&mut self, init: B, mut f: F) -> R where Self: Sized, F: FnMut(B, Self::Item) -> R, R: Try<Ok=B> { let mut accum = init; if self.first_take { self.first_take = false; if let Some(x) = self.iter.next() { accum = f(accum, x)?; } else { return Try::from_ok(accum); } } while let Some(x) = self.iter.nth(self.step) { accum = f(accum, x)?; } Try::from_ok(accum) } } impl<T> Iterator for StepByWithoutTryFold<std::ops::Range<T>> where T: Step, { type Item = T; #[inline] fn next(&mut self) -> Option<Self::Item> { self.0.first_take = false; if self.0.iter.start >= self.0.iter.end { return None; } // add 1 to self.step to get original step size back // it was decremented for the general case on construction if let Some(n) = self.0.iter.start.add_usize(self.0.step+1) { let next = std::mem::replace(&mut self.0.iter.start, n); Some(next) } else { let last = self.0.iter.start.clone(); self.0.iter.start = self.0.iter.end.clone(); Some(last) } } }

SimonSapin · 2018-06-18T01:38:18Z

Thanks for looking into this @Emerentius. Closing in favor of #51601.

Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.

timvermeulen · 2019-05-07T23:51:42Z

Since #51601 was reverted, should this be considered again?

… r=scottmcm Override `StepBy::{try_fold, try_rfold}` Previous PR: rust-lang#51435 The previous PR was closed in favor of rust-lang#51601, which was later reverted. I don't think these implementations will make it harder to specialize `StepBy<Range<_>>` later, so we should be able to land this without any consequences. This should fix rust-lang#57517 – in my benchmarks `iter` and `iter.step_by(1)` now perform equally well, provided internal iteration is used.

Override try_fold for StepBy

2d55d28

SimonSapin added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-iterators Area: Iterators labels Jun 8, 2018

rust-highfive assigned kennytm Jun 8, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 8, 2018

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 12, 2018

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 13, 2018

Emerentius mentioned this pull request Jun 16, 2018

Specialize StepBy<Range(Inclusive)> #51601

Merged

SimonSapin closed this Jun 18, 2018

timvermeulen mentioned this pull request Sep 3, 2019

Override StepBy::{try_fold, try_rfold} #64121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Override try_fold for StepBy #51435

Override try_fold for StepBy #51435

Uh oh!

SimonSapin commented Jun 8, 2018

rust-highfive commented Jun 8, 2018

SimonSapin commented Jun 8, 2018

kennytm commented Jun 8, 2018

scottmcm commented Jun 9, 2018

kennytm commented Jun 12, 2018

bors commented Jun 12, 2018

bors commented Jun 13, 2018

Emerentius commented Jun 17, 2018

SimonSapin commented Jun 18, 2018

timvermeulen commented May 7, 2019 •

edited

Loading

Labels

7 participants

Override try_fold for StepBy #51435

Override try_fold for StepBy #51435

Uh oh!

Conversation

SimonSapin commented Jun 8, 2018

rust-highfive commented Jun 8, 2018

SimonSapin commented Jun 8, 2018

kennytm commented Jun 8, 2018

scottmcm commented Jun 9, 2018

kennytm commented Jun 12, 2018

bors commented Jun 12, 2018

bors commented Jun 13, 2018

Emerentius commented Jun 17, 2018

SimonSapin commented Jun 18, 2018

timvermeulen commented May 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

7 participants

timvermeulen commented May 7, 2019 •

edited

Loading