- Notifications
You must be signed in to change notification settings - Fork 25.5k
Calculate text string length correctly for code points outside BMP #132593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate text string length correctly for code points outside BMP #132593
Conversation
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Hi @parkertimmins, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks for fixing this! Glad to see the randomized testing is finding bugs
💔 Backport failed
You can use sqren/backport to manually backport by running |
…lastic#132593) Strings parsed with the optimized UTF8 parsing have their length calculated during parsing. This length should be the same as the length if the string is parsed with the non-optimized path. Specifically, characters outside the basic multilingual plane require 2 chars per code point in the UTF16 encoding. (cherry picked from commit fa6e905)
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…lastic#132593) Strings parsed with the optimized UTF8 parsing have their length calculated during parsing. This length should be the same as the length if the string is parsed with the non-optimized path. Specifically, characters outside the basic multilingual plane require 2 chars per code point in the UTF16 encoding. (cherry picked from commit fa6e905) # Conflicts: # muted-tests.yml
…132593) (#132598) Strings parsed with the optimized UTF8 parsing have their length calculated during parsing. This length should be the same as the length if the string is parsed with the non-optimized path. Specifically, characters outside the basic multilingual plane require 2 chars per code point in the UTF16 encoding. (cherry picked from commit fa6e905)
…132593) (#132599) Strings parsed with the optimized UTF8 parsing have their length calculated during parsing. This length should be the same as the length if the string is parsed with the non-optimized path. Specifically, characters outside the basic multilingual plane require 2 chars per code point in the UTF16 encoding. (cherry picked from commit fa6e905) # Conflicts: # muted-tests.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2 - good catch Parker!
Strings parsed with the optimized UTF8 parsing have their length calculated during parsing. This length should be the same as the length if the string is parsed with the non-optimized path. Specifically, characters outside the basic multilingual plane require 2 chars per code point in the UTF16 encoding.