Feature #13712
closedString#start_with? with regexp
Description
String#start_with? should receive regexp.
When I write a parser, I want to check a string is start with a pattern or not.
It's just the same thing with StringScanner#match
If I want to do the same thing with normal string method, it needs to write like /\A#{re}/.match(…).
But if re is argument, it needs to create a new temporary regexp every time.
Though we have a workaround as follows but it's bit tricky.
"foo ".rindex(/fo+./, 0) A patch is following:
diff --git a/re.c b/re.c index d0aa2a792e..f672ba75ec 100644 --- a/re.c +++ b/re.c @@ -1588,6 +1588,84 @@ rb_reg_search(VALUE re, VALUE str, long pos, int reverse) return rb_reg_search0(re, str, pos, reverse, 1); } +bool +rb_reg_start_with_p(VALUE re, VALUE str) +{ + long pos = 0; + long result; + VALUE match; + struct re_registers regi, *regs = ®i; + regex_t *reg; + int tmpreg; + onig_errmsg_buffer err = ""; + + reg = rb_reg_prepare_re0(re, str, err); + tmpreg = reg != RREGEXP_PTR(re); + if (!tmpreg) RREGEXP(re)->usecnt++; + + match = rb_backref_get(); + if (!NIL_P(match)) { + if (FL_TEST(match, MATCH_BUSY)) { + match = Qnil; + } + else { + regs = RMATCH_REGS(match); + } + } + if (NIL_P(match)) { + MEMZERO(regs, struct re_registers, 1); + } + result = onig_match(reg, + (UChar*)(RSTRING_PTR(str)), + ((UChar*)(RSTRING_PTR(str)) + RSTRING_LEN(str)), + (UChar*)(RSTRING_PTR(str)), + regs, ONIG_OPTION_NONE); + if (!tmpreg) RREGEXP(re)->usecnt--; + if (tmpreg) { + if (RREGEXP(re)->usecnt) { + onig_free(reg); + } + else { + onig_free(RREGEXP_PTR(re)); + RREGEXP_PTR(re) = reg; + } + } + if (result < 0) { + if (regs == ®i) + onig_region_free(regs, 0); + if (result == ONIG_MISMATCH) { + rb_backref_set(Qnil); + return false; + } + else { + onig_error_code_to_str((UChar*)err, (int)result); + rb_reg_raise(RREGEXP_SRC_PTR(re), RREGEXP_SRC_LEN(re), err, re); + } + } + + if (NIL_P(match)) { + int err; + match = match_alloc(rb_cMatch); + err = rb_reg_region_copy(RMATCH_REGS(match), regs); + onig_region_free(regs, 0); + if (err) rb_memerror(); + } + else { + FL_UNSET(match, FL_TAINT); + } + + RMATCH(match)->str = rb_str_new4(str); + OBJ_INFECT(match, str); + + RMATCH(match)->regexp = re; + RMATCH(match)->rmatch->char_offset_updated = 0; + rb_backref_set(match); + + OBJ_INFECT(match, re); + + return true; +} + VALUE rb_reg_nth_defined(int nth, VALUE match) { diff --git a/string.c b/string.c index 072f1329ee..6542a4acb1 100644 --- a/string.c +++ b/string.c @@ -9126,6 +9126,7 @@ rb_str_rpartition(VALUE str, VALUE sep) RSTRING_LEN(str)-pos-RSTRING_LEN(sep))); } +extern bool rb_reg_start_with_p(VALUE re, VALUE str); /* * call-seq: * str.start_with?([prefixes]+) -> true or false @@ -9146,11 +9147,20 @@ rb_str_start_with(int argc, VALUE *argv, VALUE str) for (i=0; i<argc; i++) { VALUE tmp = argv[i]; - StringValue(tmp); - rb_enc_check(str, tmp); - if (RSTRING_LEN(str) < RSTRING_LEN(tmp)) continue; - if (memcmp(RSTRING_PTR(str), RSTRING_PTR(tmp), RSTRING_LEN(tmp)) == 0) - return Qtrue; + switch (BUILTIN_TYPE(tmp)) { + case T_REGEXP: + { + bool r = rb_reg_start_with_p(tmp, str); + if (r) return Qtrue; + } + break; + default: + StringValue(tmp); + rb_enc_check(str, tmp); + if (RSTRING_LEN(str) < RSTRING_LEN(tmp)) continue; + if (memcmp(RSTRING_PTR(str), RSTRING_PTR(tmp), RSTRING_LEN(tmp)) == 0) + return Qtrue; + } } return Qfalse; }
Updated by Eregon (Benoit Daloze) over 8 years ago
Agreed, this would be great and intuitive.
I wonder, could the symmetrical String#end_with? also work with a Regexp? (having the same effect as a trailing \z in the Regexp)
Updated by shevegen (Robert A. Heiler) over 8 years ago
I agree as well, would be nice. More than one way to do things. Should also be the same for .start_with? and .end_with?
Updated by shyouhei (Shyouhei Urabe) over 8 years ago
+1 for start_with? but I have no practical usage of end_with? so a bit negative about that part. Do people really need regexp version of .end_with?
Updated by phluid61 (Matthew Kerwin) over 8 years ago
shyouhei (Shyouhei Urabe) wrote:
+1 for start_with? but I have no practical usage of end_with? so a bit negative about that part. Do people really need regexp version of .end_with?
I've used regexen at different times to match final punctuation (e.g. /\?[!.]*/) and trailing whitespace (e.g. /\s/). I think it's more readable having str.end_with? /pattern/ instead of str =~ /pattern\z/
Updated by shyouhei (Shyouhei Urabe) over 8 years ago
phluid61 (Matthew Kerwin) wrote:
I've used regexen at different times to match final punctuation (e.g.
/\?[!.]*/) and trailing whitespace (e.g./\s/). I think it's more readable havingstr.end_with? /pattern/instead ofstr =~ /pattern\z/
I see. Thank you.
Updated by duerst (Martin Dürst) over 8 years ago
shyouhei (Shyouhei Urabe) wrote:
+1 for start_with? but I have no practical usage of end_with? so a bit negative about that part. Do people really need regexp version of .end_with?
In addition, even if we don't have a direct use case, it's very easy for somebody to try out, and then send a bug report here if it's not available. I know we don't add functionality just because "somebody eventually may need it", but in this case, it seems to be justified to streamline things.
Updated by nobu (Nobuyoshi Nakada) over 8 years ago
Will you need $~ after start_with?(re)?
Updated by phluid61 (Matthew Kerwin) over 8 years ago
nobu (Nobuyoshi Nakada) wrote:
Will you need
$~afterstart_with?(re)?
Personally, I don't see that I'll ever need it. If people do want it, they can lodge a feature request in future?
Updated by Eregon (Benoit Daloze) over 8 years ago
nobu (Nobuyoshi Nakada) wrote:
Will you need
$~afterstart_with?(re)?
It might be quite useful when parsing, to avoid doing a second match just to get captures.
Updated by phluid61 (Matthew Kerwin) over 8 years ago
Eregon (Benoit Daloze) wrote:
It might be quite useful when parsing, to avoid doing a second match just to get captures.
That could depend on whether $&, $1, $2, etc. are set. I assumed @nobu (Nobuyoshi Nakada) was only asking about $~ because allocating a whole MatchData object is heavier than just allocating some strings.
Updated by Eregon (Benoit Daloze) over 8 years ago
phluid61 (Matthew Kerwin) wrote:
That could depend on whether
$&,$1,$2, etc. are set. I assumed @nobu (Nobuyoshi Nakada) was only asking about$~because allocating a whole MatchData object is heavier than just allocating some strings.
$&, $1, etc always just read from $~, so it's the same thing.
Updated by shevegen (Robert A. Heiler) over 8 years ago
Shyouhei Urabe) wrote:
+1 for start_with? but I have no practical usage of end_with? so a bit negative about
that part. Do people really need regexp version of .end_with?
I do not know if the use case frequency is the same. Perhaps you are right that anchoring
or .start_with? is more frequent than .end_with?, via regexes.
But I think that, even when there is a much smaller use case for .end_with? (let's just
assume it for the moment), I think that both .start_with? and .end_with? should behave
the same. Otherwise people may then ask "why does .start_with? allow regex input but
.end_with? does not?". :)
I think it may be useful though?
x = 'abc def' puts 'yep, ends with either e or f' if x.end_with? /e|f/ At the least to me it seems to be mostly symmetrical use cases, even if one may
be more prevalent than the others. I guess the point may be that it just gives people
more flexibility - in these cases, if they would rather want to use a regexp than
a string, they can do so.
Updated by matz (Yukihiro Matsumoto) about 8 years ago
Agreed. Need to update Regexp.last_math.
Matz.
Updated by naruse (Yui NARUSE) about 8 years ago
- Status changed from Open to Closed
Updated by mame (Yusuke Endoh) almost 8 years ago
- Related to Feature #3388: regexp support for start_with? and end_with? added