Ignore invalid escapes in regexp comments #5721

jeremyevans · 2022-03-26T02:16:03Z

Invalid escapes are handled at multiple levels. The first level
is in parse.y, so skip invalid unicode escape checks for regexps
in parse.y.

Make rb_reg_preprocess and unescape_nonascii accept the regexp
options. In unescape_nonascii, if the regexp is an extended
regexp, when "#" is encountered, ignore all characters until the
end of line or end of regexp.

Unfortunately, in extended regexps, you can use "#" as a non-comment
character inside a character class, so also parse "[" and "]"
specially for extended regexps, and only skip comments if "#" is
not inside a character class.

This issue doesn't just affect extended regexps, it also affects
"(#?" comments inside all regexps. So for those comments, scan
until trailing ")" and ignore content inside.

I'm not sure if there are other corner cases not handled. A
better fix would be to redesign the regexp parser so that it
unescaped during parsing instead of before parsing, so you already
know the current parsing state.

Fixes [Bug #18294]

Invalid escapes are handled at multiple levels. The first level is in parse.y, so skip invalid unicode escape checks for regexps in parse.y. Make rb_reg_preprocess and unescape_nonascii accept the regexp options. In unescape_nonascii, if the regexp is an extended regexp, when "#" is encountered, ignore all characters until the end of line or end of regexp. Unfortunately, in extended regexps, you can use "#" as a non-comment character inside a character class, so also parse "[" and "]" specially for extended regexps, and only skip comments if "#" is not inside a character class. This issue doesn't just affect extended regexps, it also affects "(#?" comments inside all regexps. So for those comments, scan until trailing ")" and ignore content inside. I'm not sure if there are other corner cases not handled. A better fix would be to redesign the regexp parser so that it unescaped during parsing instead of before parsing, so you already know the current parsing state. Fixes [Bug #18294]

nobu

Should compare with end instead of NUL terminator, and consider multibytes in comments.

re.c

test/ruby/test_regexp.rb

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>

jeremyevans added 2 commits March 25, 2022 17:48

Handle characters classes inside other character classes

27295bb

jeremyevans requested a review from nobu April 20, 2022 23:04

nobu requested changes May 19, 2022

View reviewed changes

re.c Outdated Show resolved Hide resolved

re.c Outdated Show resolved Hide resolved

re.c Outdated Show resolved Hide resolved

test/ruby/test_regexp.rb Show resolved Hide resolved

Apply suggestions from code review

145a17f

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>

jeremyevans requested a review from nobu May 24, 2022 19:10

jeremyevans merged commit ec35422 into ruby:master Jun 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ignore invalid escapes in regexp comments #5721

Ignore invalid escapes in regexp comments #5721

Uh oh!

jeremyevans commented Mar 26, 2022

nobu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

Ignore invalid escapes in regexp comments #5721

Ignore invalid escapes in regexp comments #5721

Uh oh!

Conversation

jeremyevans commented Mar 26, 2022

nobu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants