Skip to content

Inconsistency of IndexError vs nil for unknown capture group #139

@eregon

Description

@eregon

After matching with a String, StringScanner#[:unknown_capture_group] returns nil:

$ irb -rstrscan irb(main):001> s=StringScanner.new("abc") => #<StringScanner 0/3 @ "abc"> irb(main):002> s.scan("a") => "a" irb(main):003> s[:foo] => nil 

After matching with a Regexp, StringScanner#[:unknown_capture_group] raises IndexError:
(using a new process to be sure to not be affected by #135)

irb -rstrscan irb(main):001> s=StringScanner.new("abc") => #<StringScanner 0/3 @ "abc"> irb(main):002> s.scan(/./) => "a" irb(main):003> s[:foo] (irb):3:in `[]': undefined group name reference: foo (IndexError) 

It would be best if the behavior is consistent.

There is a test for the nil case in https://github.com/ruby/strscan/blob/v3.1.2/test/strscan/test_stringscanner.rb#L830-L840
And there are some tests for the IndexError cases in https://github.com/ruby/strscan/blob/v3.1.2/test/strscan/test_stringscanner.rb#L443-L444 and in more places.

The implementation is

if (!RTEST(p->regex)) return Qnil;

This is proving problematic on TruffleRuby.
We would like to upstream our pure-Ruby implementation of StringScanner, but to do so we want to avoid exposing internals as much as possible.
Things like p->regex is something I don't want to expose to code living outside the truffleruby repository.
Having to workaround by keeping extra state for whether the last match was with a String or a Regexp feels very messy and wrong semantically.

Could the behavior be unified for both of these cases?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions