3

Is the bash operator =~ equivalent to a perl invocation?

filename="test-33.csv" regex="([^.]+)(-\d{1,5})(\.csv)" 

With bash test:

if [[ "$filename" =~ $regex ]]; then echo "it matches"; else echo "doesn't match"; fi # doesn't match if [[ "$filename" =~ ([^.]+)(-\d{1,5})(\.csv) ]]; then echo "matches"; else echo "doesn't match"; fi # doesn't match 

With perl:

result="$(perl -e "if ('$filename' =~ /$regex/) { exit 0;} else { exit 1;} ")" if [[ result ]]; then echo "it matches"; else echo "doesn't match"; fi # it matches 

Is there anything I am missing for the bash =~ operator? Does this have something to do with the greedy vs non-greedy iterator ([^.]+)?

1

3 Answers 3

9

There are several different types of Regular Expression, each one adding more operators (and therefore requiring more characters to be escaped if they are to be considered literals).

The =~ operator is described in the documentation (see man bash on your system or online) like this,

An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered a POSIX extended regular expression and matched accordingly

An Extended Regular Expression (ERE) can be matched with grep -E (formerly egrep). Your example is a Perl Compatible Regular Expression (PCRE), which is a superset of the ERE and will not work with =~. However, it can be trivially adapted by replacing \d with [[:digit:]]:

echo abc-123.csv | grep -E '([^.]+)(-\d{1,5})(\.csv)' # ERE fails echo abc-123.csv | grep -P '([^.]+)(-\d{1,5})(\.csv)' # PCRE matches with GNU grep echo abc-123.csv | grep -E '([^.]+)(-[[:digit:]]{1,5})(\.csv)' # ERE matches modified expression 

So, given that grep -E is equivalent to =~ we can therefore write this,

if [[ "$filename" =~ ([^.]+)(-[[:digit:]]{1,5})(\.csv) ]] then echo "matches" else echo "doesn't match" fi 

Note that your ERE should probably be prefixed with ^ and suffixed with $, and the [^.]+ adapted to [^-.]+ to ensure that you can't match strings such as abc-def-12345678-123.csv.txt:

^[^-.]+-[[:digit:]]{1,5}\.csv$ 

If you're absolutely set on using a PCRE rather than an ERE you will have to use an external tool such as the GNU implementation of grep to perform the match. But this is less efficient, and the same advice about bounding applies here as is given above:

if echo "$filename" | grep -qP '([^.]+)(-\d{1,5})(\.csv)' then echo "matches" else echo "doesn't match" fi 

The POSIX reference for basic REs (RE or BRE) and EREs is at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html, and the reference for Perl REs (PCRE) is at https://www.pcre.org/original/doc/html/pcrepattern.html. Be warned that neither is the easiest of documentation to understand.

Finally, you ask,

Does this have something to do with the greedy vs non-greedy iterator ([^.]+)?

That isn't a greedy/non-greedy iterator. [^.]+ is greedy and means "one or more of anything except a dot (.)". EREs do not have non-greedy operators. PCREs can define a non-greedy operator such as * or + by following it with ?. For example contrast a* and a*?; the first will match as many a characters as possible and the second will match as few as possible.

The ( … ) bracket is a grouping, not a greediness indicator.

2
  • "each one adding more operators (and therefore requiring more characters to be escaped" -- except that BRE has some characters that are special exactly when "escaped" by backslashes, mainly \{/\} and \(/\) (but not \+or \|). Commented Jan 23, 2024 at 18:13
  • 2
    @ilkkachu yes I know, but I'm trying to keep a complex subject as simple as I dare Commented Jan 23, 2024 at 18:14
4

The operator =~ in Bash shell is equivalent to grep -E GNU command. Perl regex are not recognized with it. You need to do something like :

~$ [ $(echo "$filename" | grep -Po "$regex") ] && echo "it matches" || echo "does not match" it matches 

to have an equivalent.

About grep options used :

-o, --only-matching show only the part of a line matching PATTERN -P, --perl-regexp PATTERN is a Perl regular expression 

With your original form this looks like :

if [[ $(echo "$filename" | grep -Po "$regex") ]]; then echo "it matches"; else echo "does not match"; fi 

This works too :

if [ $(echo "$filename" | grep -Po "$regex") ]; then echo "it matches"; else echo "does not match"; fi 

You have also the possibility to do :

yyy@xxx:~$ filename="test-33.csv" yyy@xxx:~$ regex="([^.]+)(-\d{1,5})(\.csv)" yyy@xxx:~$ result=$(echo "$filename" | grep -Po "$regex") yyy@xxx:~$ if [[ $result ]]; then echo "it matches"; else echo "does not match"; fi it matches yyy@xxx:~$ 
5
  • 3
    Instead of the [ testing non-empty output from grep (unquoted is bad), use grep -q -- echo "$filename | grep -qP "$regex" && echo match || echo no match Commented Jan 22, 2024 at 22:22
  • 1
    Would select this as the best answer. Could you please align your answer with the question... the form is if [[ expression ]]; then ...; else ...; fi Commented Jan 23, 2024 at 10:08
  • Thank you ! It's done Commented Jan 23, 2024 at 10:38
  • Thanks @hidigoudi . The other answer (edited) offers a more complete explanation. I like of yours that went straight to address the target result, though. Here some reference on the bash if statements that explains the PCRE solution of the other answer. By the way, option -q (silent) gets to bring the exit status code to the if condition. Commented Jan 23, 2024 at 11:44
  • No problem, I understand, thank you for the vote up ! Commented Jan 23, 2024 at 12:47
0

Bash has extended glob patterns which get closer to regular expressions. Within [[...]] the == operator does glob-style pattern matching.

filename=test-33.csv # one or more non-dots, a hyphen, a digit, optionally 4 more digits, the extension pattern='+([^.])-[0-9]?([0-9])?([0-9])?([0-9])?([0-9]).csv' [[ $filename == $pattern ]] && echo Y || echo N 

If you're using the regex to filter a list of filenames, use the glob pattern in a for loop instead.

shopt -s extglob for file in $pattern; do # do something with the file. echo "$file" done 

Notes

  • the shopt command: extended glob is automatically enabled within [[...]] but not otherwise.
  • $pattern is specifically unquoted in these code snippets so that it gets handled as a pattern not a literal string.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.