- Notifications
You must be signed in to change notification settings - Fork 50
AST Atom and custom char class consumers #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kylemacomber left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at the changes to the tests. This looks like great progress!
Azoy left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of properties that we don't have yet that probably make sense to expose at some point. Regarding scripts, I'm going to need to store some information on scripts regardless because I need to implement the CLDR grapheme breaking rule to be consistent with ICU. So it probably makes sense to start storing all of scripts data anyway because it seems like we need that here. Should we start making a list of what properties we need to start exposing? Should we expose more and more Unicode properties?
| case .extendedPictographic: | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we store this property now as part of grapheme breaking, so if we need to expose it we can.
| case .otherAlphabetic: | ||
| break | ||
| case .otherDefaultIgnorableCodePoint: | ||
| break | ||
| case .otherGraphemeExtended: | ||
| break | ||
| case .otherIDContinue: | ||
| break | ||
| case .otherIDStart: | ||
| break | ||
| case .otherLowercase: | ||
| break | ||
| case .otherMath: | ||
| break | ||
| case .otherUppercase: | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These properties are unfortunate because they get scooped up with their normal counterpart (Alphabetic) to match ICU's behavior when asking if a scalar isAlphabetic. If we need to make the distinction here, then that'll probably need to be a separate table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, IIRC they were because Unicode historically used general categories for these kind of queries, but it became apparent that a universal categorization is not great for this use. IIRC isAlphabetic is derived from categories and properties, so it might actually be possible to run that in reverse, that is we check isAlphabetic and excluding the others.
| case .regionalIndicator: | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property is a simple range 0x1F1E6...0x1F1FF if we wanted to expose this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's expose any properties that lets you write grapheme breaking as a regular expression for testing purposes :-)
Co-authored-by: Kyle Macomber <kmacomber@apple.com>
Co-authored-by: Richard Wei <rxrwei@gmail.com>
moar errors
| Switched all the fatal errors to throws, to at least let us know reason more and better support xfails |
| @swift-ci please test linux platform |
| @swift-ci please test linux platform |
Generate consumers for AST nodes that are atoms or custom character classes.
Most of the char class and props tests are now passing, notably except for scripts (which the stdlib doesn't surface yet, cc @Azoy).