AST Atom and custom char class consumers #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

milseman merged 7 commits into swiftlang:main from milseman:arborvore

Dec 16, 2021

Member

milseman commented Dec 16, 2021

Generate consumers for AST nodes that are atoms or custom character classes.

Most of the char class and props tests are now passing, notably except for scripts (which the stdlib doesn't surface yet, cc @Azoy).

milseman added 3 commits

December 15, 2021 20:36

wio

2d46bff

pass more character property and class tests

49a96f6

wip

757774e

milseman requested review from hamishknight and kylemacomber

December 16, 2021 03:39

kylemacomber reviewed

View reviewed changes

Tests/RegexTests/MatchTests.swift Outdated Show resolved Hide resolved

kylemacomber approved these changes

View reviewed changes

kylemacomber left a comment

I only looked at the changes to the tests. This looks like great progress!

Azoy reviewed

View reviewed changes

Contributor

Azoy left a comment

There's a lot of properties that we don't have yet that probably make sense to expose at some point. Regarding scripts, I'm going to need to store some information on scripts regardless because I need to implement the CLDR grapheme breaking rule to be consistent with ICU. So it probably makes sense to start storing all of scripts data anyway because it seems like we need that here. Should we start making a list of what properties we need to start exposing? Should we expose more and more Unicode properties?

Sources/_StringProcessing/ConsumerInterface.swift Outdated Show resolved Hide resolved

Sources/_StringProcessing/ConsumerInterface.swift Outdated

Comment on lines 347 to 348

       case .extendedPictographic:  
     break  
 

Contributor

Azoy Dec 16, 2021 •

edited

Loading

I believe we store this property now as part of grapheme breaking, so if we need to expose it we can.

Sources/_StringProcessing/ConsumerInterface.swift Outdated Show resolved Hide resolved

Sources/_StringProcessing/ConsumerInterface.swift

Comment on lines +381 to +396

       case .otherAlphabetic:  
     break  
     case .otherDefaultIgnorableCodePoint:  
     break  
     case .otherGraphemeExtended:  
     break  
     case .otherIDContinue:  
     break  
     case .otherIDStart:  
     break  
     case .otherLowercase:  
     break  
     case .otherMath:  
     break  
     case .otherUppercase:  
     break  
 

Contributor

Azoy Dec 16, 2021

These properties are unfortunate because they get scooped up with their normal counterpart (Alphabetic) to match ICU's behavior when asking if a scalar isAlphabetic. If we need to make the distinction here, then that'll probably need to be a separate table.

Member Author

milseman Dec 16, 2021

Hmm, IIRC they were because Unicode historically used general categories for these kind of queries, but it became apparent that a universal categorization is not great for this use. IIRC isAlphabetic is derived from categories and properties, so it might actually be possible to run that in reverse, that is we check isAlphabetic and excluding the others.

Sources/_StringProcessing/ConsumerInterface.swift Outdated

Comment on lines 407 to 408

       case .regionalIndicator:  
     break  
 

Contributor

Azoy Dec 16, 2021

This property is a simple range 0x1F1E6...0x1F1FF if we wanted to expose this.

Member Author

milseman Dec 16, 2021

Let's expose any properties that lets you write grapheme breaking as a regular expression for testing purposes :-)

Sources/_StringProcessing/ConsumerInterface.swift Outdated Show resolved Hide resolved

rxwei reviewed

View reviewed changes

Sources/_StringProcessing/ConsumerInterface.swift Show resolved Hide resolved

hamishknight approved these changes

View reviewed changes

Sources/_StringProcessing/ConsumerInterface.swift Outdated Show resolved Hide resolved

milseman and others added 3 commits

December 16, 2021 07:32

Update Tests/RegexTests/MatchTests.swift

d3f2070

Co-authored-by: Kyle Macomber <kmacomber@apple.com>

Update Sources/_StringProcessing/ConsumerInterface.swift

9a2e51c

Co-authored-by: Richard Wei <rxrwei@gmail.com>

wip: plumb unsupported errors through

7a212e0

moar errors

Member Author

milseman commented Dec 16, 2021

Switched all the fatal errors to throws, to at least let us know reason more and better support xfails

Member Author

milseman commented Dec 16, 2021

@swift-ci please test linux platform

Apply Alejandro's feedback

78f8afa

Member Author

milseman commented Dec 16, 2021

@swift-ci please test linux platform

milseman merged commit 25f6fdc into swiftlang:main

milseman deleted the arborvore branch

December 16, 2021 16:05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment