Feature #18583
openPattern-matching: API for custom unpacking strategies?
Description
I started to think about it when discussing https://github.com/ruby/strscan/pull/30.
The thing is, usage of StringScanner for many complicated parsers invokes some kind of branching.
In pseudocode, the "ideal API" would allow to write something like this:
case <what next matches> in /regexp1/ => value_that_matched # use value_that_matched in /regexp2/ => value_that_matched # use value_that_matched # ... This seems "intuitively" that there should be some way of implementing it, but we fall short. We can do some StringScanner-specific matcher object which defines its own #=== and use it with pinning:
case scanner in ^(Matcher.new(/regexp1/)) => value_that_matched # ... But there is no API to tell how the match result will be unpacked, just the whole StringScanner will be put into value_that_matched.
So, I thought that maybe it would be possible to define some kind of API for pattern-like objects, the method with signature like try_match_pattern(value), which by default is implemented like return value if self === value, but can be redefined to return something different, like part of the object, or object transformed somehow.
This will open some interesting (if maybe uncanny) possibilities: not just slicing out the necessary part, but something like
value => ^(type_caster(Integer)) => int_value So... Just a discussion topic!
Updated by zverok (Victor Shepelev) almost 4 years ago
One simpler example is, that matching something with regexps with capture groups is still quite annoying!
case string when /{{(.+?)}}/ content = Regexp.last_match[1] # looking into global value isn't exactly elegant, right? We could've probably bend it towards
case string in /{{(.+?)}}/ => content # the matched group This, though, raises a question of several match groups, at which point one starts to want more:
case string in /{{(.+?): (.+?)}}/ => [key, value] # use key and value in /{{=(?<named>.+?)}}/ => {named:} # use named ...so... IDK.
Updated by hmdne (hmdne -) over 3 years ago
# looking into global value isn't exactly elegant, right?
It's not global, it's Fiber-local, so are $1 and friends. This may not be messaged well enough in the documentation though...
[1] pry(main)> z = Fiber.new { /(.)/ =~ 'test' } => #<Fiber:0x00007f698a2897e0 (pry):1 (created)> [2] pry(main)> z.resume => 0 [3] pry(main)> Regexp.last_match => nil [4] pry(main)>
Updated by palkan (Vladimir Dementyev) over 3 years ago
This, though, raises a question of several match groups, at which point one starts to want more:
case string in /{{(.+?): (.+?)}}/ => [key, value] # use key and value in /{{=(?<named>.+?)}}/ => {named:} # use named...so... IDK.
This one could be achieve via guards:
case val in /(foo|bar)/ if $~ in [val] puts val in /(?<named>\d+)/ if $~ in {named: } puts named end That would require adding MatchData#{deconstruct,deconstruct_keys}, though:
refine MatchData do alias deconstruct captures def deconstruct_keys(*) named_captures.transform_keys(&:to_sym) end end Regarding the original proposal (the unpacking API), I think, it could bring more confusion than value. Adding one more implicit layer (in addition to #deconstruct and #deconstruct_keys, which could also be overridden) would make pattern matching even more magical in a bad sense.
Updated by ntl (Nathan Ladd) over 1 year ago ยท Edited
Could the match operator, =~, could be used as a general complement to ===?
Example (following original sketch from @zverok (Victor Shepelev)):
class Matcher def initialize(regexp) @regexp = regexp end def ===(obj) @regexp.match?(obj) end def =~(obj) match_data = @regexp.match(obj) match_data end end case "some string" in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data some_named_capture = match_data[:some_named_capture] puts "Match: #{some_named_capture}" end The implementation of =~ would be optional in my view; not implementing it on whatever implements === would just cause Ruby to behave as it does now:
class Matcher def initialize(regexp) @regexp = regexp end def ===(obj) @regexp.match?(obj) end end case "some string" in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_variable # match_variable is just "some string" puts match_variable.inspect end This would add =~ to the pattern matching protocol that's currently comprised of ===, deconstruct and deconstruct_keys. It would make === significantly more useful, and regular expressions provide a compelling example for why: when matching a string to a regular expression pattern, the string is already in lexical scope, but the match data provides new useful information that only comes into existence upon a successful match:
subject = "some string" case subject in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data # Capturing the match data variable instead of the original string doesn't make the original string inaccessible: puts "Match subject: #{subject.inspect}" # match_data provides additional useful information: some_named_capture = match_data[:some_named_capture] puts "Match data: :#{some_named_capture}" end I also suspect this could be embedded into the pattern matching syntax itself, would could allow for some highly useful possibilities. One example that leaps to mind is reifying primitive data parsed from JSON into a data structure:
SomeStruct = Struct.new(:some_attr, :some_other_attr) do def self.===(data) data.is_a?(Hash) && data.key?(:some_attr) && data.key?(:some_other_attr) end def self.=~(data) new(**data) end end some_json = <<JSON { "some_attr": "some value", "some_other_attr": "some other value" } JSON # Parse JSON into raw (primitive) data some_data = JSON.parse(some_json, symbolize_names: true) case some_data in SomeStruct => some_struct # some_sturct is a reified data structure (SomeStruct) built from some_data puts some_struct.inspect end