You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto merge of #123606 - thomcc:thomcc/opt-lines, r=<try>
Optimize core::str::Lines::count `s.lines().count()+1` is somewhat common as a way to find the line number given a byte position, so it'd be nice if it were faster. This just generalizes the SWAR-optimized char counting code so that it can be used for SWAR-optimized line counting, so it's actually not very complex of a PR. ## TODO - [x] benchmarks - [x] adjust comments - [ ] more tests ## Benchmarks `case00_libcore` is the new version, and `case01_fold_increment` is the previous implementation (the default impl of `Iterator::count()` is a fold that increments ``` str::line_count::all_newlines_32kib::case00_libcore 4.35µs/iter +/- 11.00ns str::line_count::all_newlines_32kib::case01_fold_increment 779.99µs/iter +/- 8.43µs str::line_count::all_newlines_4kib::case00_libcore 562.00ns/iter +/- 5.00ns str::line_count::all_newlines_4kib::case01_fold_increment 97.81µs/iter +/- 1.48µs str::line_count::all_newlines_64b::case00_libcore 21.00ns/iter +/- 0.00ns str::line_count::all_newlines_64b::case01_fold_increment 1.49µs/iter +/- 32.00ns str::line_count::en_huge::case00_libcore 45.58µs/iter +/- 122.00ns str::line_count::en_huge::case01_fold_increment 167.62µs/iter +/- 609.00ns str::line_count::en_large::case00_libcore 734.00ns/iter +/- 6.00ns str::line_count::en_large::case01_fold_increment 2.62µs/iter +/- 9.00ns str::line_count::en_medium::case00_libcore 100.00ns/iter +/- 0.00ns str::line_count::en_medium::case01_fold_increment 347.00ns/iter +/- 0.00ns str::line_count::en_small::case00_libcore 18.00ns/iter +/- 1.00ns str::line_count::en_small::case01_fold_increment 60.00ns/iter +/- 2.00ns str::line_count::en_tiny::case00_libcore 6.00ns/iter +/- 0.00ns str::line_count::en_tiny::case01_fold_increment 60.00ns/iter +/- 0.00ns str::line_count::zh_huge::case00_libcore 40.63µs/iter +/- 85.00ns str::line_count::zh_huge::case01_fold_increment 205.10µs/iter +/- 1.62µs str::line_count::zh_large::case00_libcore 655.00ns/iter +/- 1.00ns str::line_count::zh_large::case01_fold_increment 3.21µs/iter +/- 21.00ns str::line_count::zh_medium::case00_libcore 92.00ns/iter +/- 0.00ns str::line_count::zh_medium::case01_fold_increment 420.00ns/iter +/- 2.00ns str::line_count::zh_small::case00_libcore 20.00ns/iter +/- 1.00ns str::line_count::zh_small::case01_fold_increment 63.00ns/iter +/- 1.00ns str::line_count::zh_tiny::case00_libcore 6.00ns/iter +/- 0.00ns str::line_count::zh_tiny::case01_fold_increment 21.00ns/iter +/- 0.00ns ``` This is a speedup of around 2x-4x most of the time, but for some highly unrealistic scenarios (32KiB of newlines) it's up to almost 200x faster (because the time taken by the version in this PR is not dependent on the number of newlines in the input, but the old version is slower the more newlines are present). It's also much faster for small inputs, especially if they have newlines (10x faster for en_tiny). Real world cases will vary, don't read too much into these, I would expect 2x-4x speedup in general, since that's what it gets on the most realistic examples. Obviously a SIMD impl will beat this, but users who are really bottlenecked on this operation should probably just reach for crates.io (even if we provided a SIMD version, libcore can't use runtime CPU feature detection so they'd still be better off with something from crates.io).
0 commit comments