Skip to content

Conversation

@hamishknight
Copy link
Contributor

Complete the delimiter lexing support such that the C++ compiler's lexer will be able to call into the implementation and allow us to lex the delimiters we want to support, and hand back the compiler a pointer from which to resume lexing.

In addition, expose a parsing entrypoint that expects a regex literal with delimiters, from which we can infer the syntax options. This will be used by both the C++ parser, and by a new Regex(_regexString:version:) initializer that the compiler will be switched over to calling. Adding a version parameter will allow us to more easily make versioned changes to the format of regex string we pass to the runtime.

Once the C++ logic is switched over to these new entrypoints, the regex delimiters will no longer be '...', instead they will be '/.../' for traditional interior syntax, or '|...|' for modern syntax.


/// Drop a set of regex delimiters from the input string, returning the contents
/// and the delimiter used. The input string must have valid delimiters.
func droppingRegexDelimiters(_ str: String) -> (String, Delimiter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point of this function, why would we have a parallel implementation over String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used by both:

  • The runtime to drop the delimiters off and get the syntax options to use
  • The C++ parser as we don't currently track the contents range on the C++ Token so we need to drop off the delimiters again when we come to parse

We could have both of those go back through the lexing function, I don't feel too strongly about it. I agree it's not ideal having two implementations of effectively the same operation

while true {
switch load() {
case nil, ascii("\n"), ascii("\r"):
return .failure(LexError(.endOfString, resumeAt: current))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but how do you think we should do multi-line support? I guess with the version argument we can add that at any time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as long as we can pick it up by e.g a different delimiter, we could change the logic to support multiple lines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delimiter ideally would be something like

let r = '''/ regex '''/ 

or something.

@hamishknight
Copy link
Contributor Author

Moved the libswift entrypoints into the repo, how does this look @milseman?

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM and follow up can be in a subsequent PR.

_ inputPtr: UnsafePointer<CChar>?,
_ bufferEndPtr: UnsafePointer<CChar>?,
_ errOut: UnsafeMutablePointer<UnsafePointer<CChar>?>?
) -> UnsafePointer<CChar>? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need another annotation to get the C calling convention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need something like @_cdecl("_libswift_lexRegexLiteral").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the initialize function in libSwift makes a closure to call this function, I believe it will get an automagic @convention(c) on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's dynamically registering a closure, we may not need @usableFromInline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to drop @usableFromInline

@hamishknight
Copy link
Contributor Author

@swift-ci please test Linux

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Comments can be done after merge if you want.

@usableFromInline
init(_regexString pattern: String, version: Int) {
// The version argument is passed by the compiler using the value defined
// in libswiftParseRegexLiteral.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert its equal to 1, at least for now? And/or make an internal global for current version number, and we'll expand on that if/when we need a new one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this?

@milseman
Copy link
Member

@swift-ci please test linux platform

Complete the lexing support such that the C++ compiler's lexer will be able to call into the implementation and allow us to lex the delimiters we want to support, and hand back the compiler where to resume lexing from. In addition, expose a parsing entrypoint that expects a regex literal with delimiters, from which we can infer the syntax options. This will be used by both the C++ parser, and by the runtime to allow us to preserve syntax options. Once the C++ logic is switched over to these new entrypoints, the regex delimiters will no longer be `'...'`, instead they will be `'/.../'` for traditional interior syntax, or `'|...|'` for modern syntax.
Add a new overload of the compiler intrinsic initializer for Regex initialization that allows for a versioned argument, and use it to start parsing a string with delimiters, which allows the syntax options to be picked up at runtime.
@hamishknight
Copy link
Contributor Author

@swift-ci please test Linux

@hamishknight hamishknight merged commit ec47cb0 into swiftlang:main Dec 16, 2021
@hamishknight hamishknight deleted the flex branch December 16, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants