Skip to content

Conversation

@h-a-n-a
Copy link
Contributor

@h-a-n-a h-a-n-a commented Sep 19, 2025

Description:

Continue from #11085

This PR adds Wtf8Atom to represent unpaired surrogates (i.e. lone surrogates) in Rust.

BREAKING CHANGE:

Related issue (if exists):

Reimplemented a part of #10987

@changeset-bot
Copy link

changeset-bot bot commented Sep 19, 2025

🦋 Changeset detected

Latest commit: dc8df0a

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 19, 2025

CodSpeed Performance Report

Merging #11104 will not alter performance

Comparing h-a-n-a:hstr-wtf8 (dc8df0a) with main (7af1474)1

Summary

✅ 140 untouched

Footnotes

  1. No successful run was found on main (bdee12c) during the generation of this report, so 7af1474 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@h-a-n-a
Copy link
Contributor Author

h-a-n-a commented Sep 19, 2025

@h-a-n-a h-a-n-a marked this pull request as ready for review September 23, 2025 06:28
@h-a-n-a h-a-n-a requested a review from a team as a code owner September 23, 2025 06:28
@CPunisher CPunisher requested review from CPunisher and removed request for a team September 23, 2025 06:30
@h-a-n-a h-a-n-a changed the title feat(hstr): introduce Wtf8Atom feat(hstr): Introduce Wtf8Atom Sep 23, 2025
@CPunisher
Copy link
Member

I think you'd better move wtf8atom related code to an isolated mod

CPunisher
CPunisher previously approved these changes Sep 23, 2025
@h-a-n-a h-a-n-a marked this pull request as draft September 23, 2025 10:40
@kdy1
Copy link
Member

kdy1 commented Sep 23, 2025

Is it ready for a merge?

@h-a-n-a
Copy link
Contributor Author

h-a-n-a commented Sep 23, 2025

@kdy1 It's ready now. I've added a new commit to replace Wtf8::from_bytes with Wtf8::from_bytes_unchecked as from_bytes actually requires the byte slice being encoded as well-formed WTF-8.

@h-a-n-a h-a-n-a marked this pull request as ready for review September 23, 2025 11:26
CPunisher
CPunisher previously approved these changes Sep 23, 2025
Copy link
Member

@kdy1 kdy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdy1 kdy1 added this to the Planned milestone Sep 23, 2025
@kdy1 kdy1 enabled auto-merge (squash) September 23, 2025 15:22
@kdy1 kdy1 merged commit 8cfd47b into swc-project:main Sep 23, 2025
322 of 324 checks passed
@h-a-n-a h-a-n-a deleted the hstr-wtf8 branch September 24, 2025 05:59
@kdy1 kdy1 modified the milestones: Planned, v1.13.20 Sep 27, 2025
kdy1 pushed a commit that referenced this pull request Oct 27, 2025
**Description:** This PR reapplied #10987 and replaced `lone_surrogates: bool` mark with `Wtf8Atom`. `lone_surrogates` introduced in #10987 is more of an implicit mark, whereas the use of `Wtf8Atom` makes it more explicit where users need to explicitly call `to_string_lossy` to get the lossy UTF-8 result, making a huge difference from the original implementation where users need to convert `\uFFFDxxx` into the correct unicode. **BREAKING CHANGE:** Both `cooked` in `TemplateElement` and `value` in `StringLiteral` are replaced with `Wtf8Atom` introduced in #11104. `Wtf8Atom` does not expose a `to_string` method like the old `Atom` does. Internally, it stores the code points of the characters. You can call `code_points()` to get an iterator of the code points or call `to_string_lossy()` to get an lossy string in which all unpaired surrogates are replaced with `U+FFFD`(Replacement Character).
@swc-project swc-project locked as resolved and limited conversation to collaborators Oct 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

3 participants