-   Notifications  
You must be signed in to change notification settings  - Fork 13.9k
 
 Optimize std::str::Chars::next and std::str::Chars::next_back #142038 
 New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|   ping @scottmcm ?  |  
|   So you know, you can make diff views in godbolt: https://godbolt.org/z/Thn1bf9qG  |  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general structure here does make sense to me, but overall I feel like it removed a bunch of helpers and constants unnecessarily. Not having utf8_first_byte, sure, but this ends up repeating the X << 6 | (Y & 0x3F) in a bunch of places, so keeping the utf8_acc_cont_byte to do that would make sense to me. The standard library is always compiled with optimizations, and the MIR inliner will inline it, so there's no reason to avoid the function call. Having the u32::from in there would also make the two functions more similar, since now the forward one is using as u32 in a different line instead with no obvious reason whey they should differ.
26b614c to 54a699b   Compare    
 I could not get LLVM to produce the   |  
|   r? @scottmcm  |  
|   Requested reviewer is already assigned to this pull request. Please choose another assignee.  |  
|   r? libs  |  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an implementation standpoint this looks good, but the unsafety isn't encapsulated correctly. This should be a pretty easy fix.
Could you update the godbolt links in the top post after this change?
   library/core/src/str/validations.rs  Outdated    
 | // SAFETY: `bytes` produces a UTF-8-like string | ||
| let mut next_byte = || unsafe { | ||
| let b = *bytes.next().unwrap_unchecked(); | ||
| assume(utf8_is_cont_byte(b)); | ||
| b | ||
| }; | ||
|   |  ||
| // SAFETY: `bytes` produces a UTF-8-like string | ||
| let combine = |c: u32, b: u8| unsafe { disjoint_bitor(c << 6, u32::from(b & CONT_MASK)) }; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These preconditions don't match up; by this API it is "safe" but completely unsound to call next_byte() 5 times on a 4 byte codepoint. And the safety comments don't cover the precondition.
Instead, this could probably be a function:
#[inline] unsafe fn advance_mask<'a, I: Iterator<Item = &'a u8>>(bytes: &mut I) -> u32 { // next_byte followed by combine }to ensure that unsafety is encapsulated. Applies to both the forward and backward version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closures doesn't escape the body of the function, so we shouldn't need to worry about them being used unsoundly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential use isn't really an issue; the problem is that the calls to next_byte() look innocently safe, but that isn't accurate.
|   I'll rerun after the above change but let's get a baseline @bors2 try  |  
   This comment has been minimized. 
   
 This comment has been minimized.
   This comment has been minimized. 
   
 This comment has been minimized.
Optimize `std::str::Chars::next` and `std::str::Chars::next_back`
   This comment has been minimized. 
   
 This comment has been minimized.
|   Finished benchmarking commit (38eb248): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with  @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy. 
 Max RSS (memory usage)Results (primary 4.8%, secondary 2.5%)A less reliable metric. May be of interest, but not used to determine the overall result above. 
 CyclesResults (secondary 3.7%)A less reliable metric. May be of interest, but not used to determine the overall result above. 
 Binary sizeResults (primary 0.0%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above. 
 Bootstrap: 468.177s -> 466.627s (-0.33%)  |  
|   Well, those results are a bit interesting. From the first godbolt link in the top post I'm not really sure why we're showing regressions; the prelude is identical, the fastest path starting at  Cc the asm analyzer expert @hanna-kruppe who could probably provide some more insight.  |  
|   At least one benchmark that's gotten slower (unicode-normalization) uses   |  
54a699b to 3069ce8   Compare      This comment has been minimized. 
   
 This comment has been minimized.
   This comment has been minimized. 
   
 This comment has been minimized.
3069ce8 to 652813e   Compare   |   Since the perf job doesn't really show anything useful here, do you have local benchmarks indicating an improvement?  |  
There are only 0x10FFFF possible codepoints, so we can exhaustively test all of them.
By reordering some operations, we can expose some opportunites for CSE. Also convert the series of nested `if` branches to early return, which IMO makes the code clearer. Comparison of assembly before and after for `next_code_point`: https://godbolt.org/z/9Te84YzhK Comparison of assembly before and after for `next_code_point_reverse`: https://godbolt.org/z/fTx1a7oz1
652813e to 43c4909   Compare   |   This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.  |  
 Benchmark resultsVery strange, I need to investigate why the regression is so large on x86 AArch64 (Apple M3)x86_64 (AMD Ryzen 9 9950X) |  
Before/after for
next: https://godbolt.org/z/Yb9TGc4vaBefore/after for
next_back: https://godbolt.org/z/v6x7GWsj1std::sys_common::wtf8::Wtf8CodePointswill also benefit from this, since it uses the samenext_code_pointandnext_code_point_reversefunctions internally.I also added tests for all codepoints in the range
0..=char::MAX(including surrogats that can only appear in WTF-8), so the new implementations have been exhaustively tested