- Notifications
You must be signed in to change notification settings - Fork 50
Delimiter lexing support for '/ and '| #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| | ||
| /// Drop a set of regex delimiters from the input string, returning the contents | ||
| /// and the delimiter used. The input string must have valid delimiters. | ||
| func droppingRegexDelimiters(_ str: String) -> (String, Delimiter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the point of this function, why would we have a parallel implementation over String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used by both:
- The runtime to drop the delimiters off and get the syntax options to use
- The C++ parser as we don't currently track the contents range on the C++
Tokenso we need to drop off the delimiters again when we come to parse
We could have both of those go back through the lexing function, I don't feel too strongly about it. I agree it's not ideal having two implementations of effectively the same operation
| while true { | ||
| switch load() { | ||
| case nil, ascii("\n"), ascii("\r"): | ||
| return .failure(LexError(.endOfString, resumeAt: current)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but how do you think we should do multi-line support? I guess with the version argument we can add that at any time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as long as we can pick it up by e.g a different delimiter, we could change the logic to support multiple lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delimiter ideally would be something like
let r = '''/ regex '''/ or something.
| Moved the libswift entrypoints into the repo, how does this look @milseman? |
milseman left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM and follow up can be in a subsequent PR.
| _ inputPtr: UnsafePointer<CChar>?, | ||
| _ bufferEndPtr: UnsafePointer<CChar>?, | ||
| _ errOut: UnsafeMutablePointer<UnsafePointer<CChar>?>? | ||
| ) -> UnsafePointer<CChar>? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need another annotation to get the C calling convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need something like @_cdecl("_libswift_lexRegexLiteral").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the initialize function in libSwift makes a closure to call this function, I believe it will get an automagic @convention(c) on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's dynamically registering a closure, we may not need @usableFromInline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to drop @usableFromInline
| @swift-ci please test Linux |
milseman left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Comments can be done after merge if you want.
| @usableFromInline | ||
| init(_regexString pattern: String, version: Int) { | ||
| // The version argument is passed by the compiler using the value defined | ||
| // in libswiftParseRegexLiteral. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert its equal to 1, at least for now? And/or make an internal global for current version number, and we'll expand on that if/when we need a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's this?
| @swift-ci please test linux platform |
Complete the lexing support such that the C++ compiler's lexer will be able to call into the implementation and allow us to lex the delimiters we want to support, and hand back the compiler where to resume lexing from. In addition, expose a parsing entrypoint that expects a regex literal with delimiters, from which we can infer the syntax options. This will be used by both the C++ parser, and by the runtime to allow us to preserve syntax options. Once the C++ logic is switched over to these new entrypoints, the regex delimiters will no longer be `'...'`, instead they will be `'/.../'` for traditional interior syntax, or `'|...|'` for modern syntax.
Add a new overload of the compiler intrinsic initializer for Regex initialization that allows for a versioned argument, and use it to start parsing a string with delimiters, which allows the syntax options to be picked up at runtime.
| @swift-ci please test Linux |
Complete the delimiter lexing support such that the C++ compiler's lexer will be able to call into the implementation and allow us to lex the delimiters we want to support, and hand back the compiler a pointer from which to resume lexing.
In addition, expose a parsing entrypoint that expects a regex literal with delimiters, from which we can infer the syntax options. This will be used by both the C++ parser, and by a new
Regex(_regexString:version:)initializer that the compiler will be switched over to calling. Adding a version parameter will allow us to more easily make versioned changes to the format of regex string we pass to the runtime.Once the C++ logic is switched over to these new entrypoints, the regex delimiters will no longer be
'...', instead they will be'/.../'for traditional interior syntax, or'|...|'for modern syntax.