-
- Notifications
You must be signed in to change notification settings - Fork 1.3k
feat(hstr): Introduce Wtf8Atom #11104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: dc8df0a The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
CodSpeed Performance ReportMerging #11104 will not alter performanceComparing Summary
Footnotes |
| Binding test failed to the same reason as https://github.com/swc-project/swc/actions/runs/17853640640/job/50767647737 |
Wtf8AtomWtf8Atom | I think you'd better move wtf8atom related code to an isolated mod |
| Is it ready for a merge? |
| @kdy1 It's ready now. I've added a new commit to replace |
kdy1 left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this once https://github.com/swc-project/swc/actions/runs/17944643950 finishes
**Description:** This PR reapplied #10987 and replaced `lone_surrogates: bool` mark with `Wtf8Atom`. `lone_surrogates` introduced in #10987 is more of an implicit mark, whereas the use of `Wtf8Atom` makes it more explicit where users need to explicitly call `to_string_lossy` to get the lossy UTF-8 result, making a huge difference from the original implementation where users need to convert `\uFFFDxxx` into the correct unicode. **BREAKING CHANGE:** Both `cooked` in `TemplateElement` and `value` in `StringLiteral` are replaced with `Wtf8Atom` introduced in #11104. `Wtf8Atom` does not expose a `to_string` method like the old `Atom` does. Internally, it stores the code points of the characters. You can call `code_points()` to get an iterator of the code points or call `to_string_lossy()` to get an lossy string in which all unpaired surrogates are replaced with `U+FFFD`(Replacement Character).
Description:
Continue from #11085
This PR adds
Wtf8Atomto represent unpaired surrogates (i.e. lone surrogates) in Rust.BREAKING CHANGE:
Related issue (if exists):
Reimplemented a part of #10987