RegexBuddy—The best regex editor and tester for re2 users!

RE2 Regular Expressions

RE2 is Google’s regular expression engine. It is used in some Google products such as Google BigQuery and Google Sheets, as well as a variety of other applications and programming languages.

One major difference between RE2 and all the other regex engines discussed on this website is that RE2 is a non-backtracking engine. While some think that this makes RE2 exceptionally fast, that is not the case. RE2 can be just as fast than, say, PCRE2, Boost, or std::regex, which are other regex engines that integrate easily with C and C++. But with complex regular expressions RE2 can be slower to find its matches. The key advantage of RE2 is that it can never suffer from catastrophic backtracking, which could cause the other engines to crash, run out of memory, or seemingly run forever on certain (poorly written) regular expressions. This typically occurs when the regex cannot match the subject string. RE2 trades some matching speed for a guaranteed failure speed. If you’re building a server or cloud application that allows users to provide their own regular expressions then RE2 fully protects you against ReDoS attacks. The price you pay is that RE2 does not support commonly used features such as backreferences and lookaround that the other engines do support.

What is said about RE2 on this website applies to RE2 release 2017-05-01 and later. Unless mentioned otherwise, the RE2 syntax discussed in the regular expressions tutorial is its default syntax. The POSIX syntax simply disables some features. It is only useful if you’re using RE2 in a system that needs to be POSIX-compliant without any extensions.

RE2’s regex syntax hasn’t really changed since release 2017-05-01. Release 2021-11-01 added named capturing groups using the Python syntax. Release 2023-09-01 added named capturing groups using the .NET syntax, but only the variant with angle brackets. Though RE2 does not support backreferences in the regular expression, it does allow capturing groups to be retrieved after a match is found.

RE2::PartialMatch() and RE2::FullMatch()

If you want to check whether a regex can find a match in a single subject string, and you don’t need to set any options, then you can call one of RE2's static member functions in a single line of code. RE2::PartialMatch(subject, regex) returns TRUE if regex can match part of subject. RE2::FullMatch(subject, regex) returns TRUE if regex can find a single match that spans all of subject. Both arguments have the StringPiece class as their declared types. But you can pass a const char* or a std::string to either of them. The StringPiece class has non-explicit constructors to convert const char* and std::string.

If you want to use the same regex more than once then you should construct an RE2 instance. Pass the regex as a string to the RE2() constructor and assign it to a variable. We’ll assume this variable is called re for the remainder of this topic. You can then pass this re variable as the second argument to RE2::PartialMatch() and RE2::FullMatch() to check different strings for matches of the same regex.

If your regular expression has capturing groups then you can pass additional arguments to RE2::PartialMatch() and RE2::FullMatch() to retrieve the text matched by capturing groups. This additional arguments can be of type string* or absl::string_view*. If the capturing group will match a string of digits then you can pass an argument of an integer type to retrieve the group’s match converted to a number. You can pass NULL for groups that you don’t want to retrieve. You can pass as many additional arguments as the regex has capturing groups. You can pass fewer arguments if you’re only interested in the first view groups, or no additional arguments if you don’t want to retrieve any groups.

RE2::Options

If you want to change some matching modes from their defaults then you first need to declare a variable of type RE2::Options. The implicit constructor sets the default options. You can assign RE2::POSIX to the variable to use a set of options that make RE2 more POSIX-like. If you want to change some options, call one or more of the member functions listed below. The description explains what happens when you pass true to the function.

Function	Default	`RE2::POSIX`	Description	Restriction
set_posix_syntax	false	true	Use a limited syntax similar to POSIX ERE.
set_longest_match	false	true	Find the leftmost longest match.
set_literal	false	false	Treat the regex as literal text.
set_never_nl	false	false	Do not allow the regex to match newline characters.	If `true` then `dot_nl` has no effect.
set_dot_nl	false	false	Allow the dot to match newline characters.
set_never_capture	false	false	All groups are non-capturing.
set_case_sensitive	true	true	Case sensitive matching.
set_perl_classes	n/a	false	Support shorthands `\d\s\w`.	Ignored and assumed as `true` when `posix_syntax == false`.
set_word_boundary	n/a	false	Support word boundary `\b`.	Ignored and assumed as `true` when `posix_syntax == false`.
set_one_line	n/a	false	Anchors match only at the start and end of the string.	Ignored and assumed as `true` when `posix_syntax == false`.

It’s unfortunate that set_one_line has no effect unless you’re using the POSIX syntax and that it’s effectively true by default. This means that when using RE2’s full regex flavor, ^ and $ only match at the start and end of the string unless prefix the regular expression with the (?m) mode modifier. It would have been more useful to have set_one_line(false) as the fixed option because the full syntax already supports \A and \z to match the start and end of the string.

Starting the regex with (?i) is equivalent to set_case_sensitive(false) and (?s) is the same as set_dot_nl(true). Only these 3 options have modifier letters. Mode modifiers are not available when using the POSIX syntax.

RE2 Code Sample

This code snippet shows how you could use RE2 to match a line in the form of name=123 and capture the name and the number into separate variables:

std::string name; int number; re2::RE2::Options options; options.set_case_sensitive(false); re2::RE2 re("(?m)^([a-z]+)=([0-9]+)$", options); if (re2::RE2::PartialMatch(subject, re, &name, &number)) { // Captured name and number } else { // Mach not found }

RE2 Search And Replace

Call RE2::Replace(&subject, regex, replacement) to replace the first match of regex in subject with replacement. It returns true if a replacement was made and false if not. Call RE2::GlobalReplace(&subject, regex, replacement) to replace all matches of regex in subject with replacement. It returns the number of matches that were replaced. The regex argument can be an instance of the RE2 that you can reuse for multiple operations. If you only need to use the regex once and don’t need to set any options then you can pass it as a string.

The replacement argument should be a string. RE2 supports a very limited replacement text syntax. \0 is the overall regex match and \1 to \9 are backreferences to the first 9 capturing groups. While RE2 does not support backreferences in regular expressions, it does support them in replacement strings as it can retrieve capturing groups after the regex has found a match. Double-digit backreferences are not supported. The second digit is a literal. All other backslashes must be escaped.

Regexes and Replacements as Literal Strings in C++

Both regular expressions and RE2’s replacement text syntax require backslashes to be escaped. C++ string literals also require backslashes to be escaped. So the regex \\ which matches a single backslash and the replacement \\ which replaces with a single backslash both need to be written as "\\\\" in your C++ code. Use C++ raw strings to avoid this double doubling of backlashes. You can use R"\\" to code this regex and replacement.

This code snippet replaces all lines in the form of left=right into right=left:

std::string subject("a=b\nc=d"); re2::RE2::GlobalReplace(&subject, R"(([^\s=]+)=([^\s=]+))", R"(\2=\1)");

Invalid Backreferences in Replacement Strings

A backreference that specifies a higher number than the number of capturing groups in the regex is invalid. \9 is invalid if there are 8 or fewer capturing groups in the regex. Since the 2020-06-01 release of RE2, such invalid backreferences cause RE2::Replace() and RE2::GlobalReplace() leave the subject string unchanged. RE2::Replace() returns false and RE2::GlobalReplace() returns 0 as if the regex didn’t find any matches at all, while actually the problem is with the replacement string. Older versions of RE2 removed invalid backreferences from the replacement string and then allowed the search-and-replace to proceed normally. If the replacement string consisted entirely of invalid backreferences then this would end up deleting all regex matches.