blob: e9c6ff4932dd6e7026a4b4fb7eeeadfd946b1cfe [file] [log] [blame]
Russ Cox0a38cba2010-03-02 17:17:51 -08001RE2 regular expression syntax reference
2-------------------------­-------­-----
3
4Single characters:
Russ Coxea892b12012-10-21 10:48:11 -04005. any character, possibly including newline (s=true)
Russ Cox0a38cba2010-03-02 17:17:51 -08006[xyz] character class
7[^xyz] negated character class
8\d Perl character class
9\D negated Perl character class
Russ Coxc4f35a12014-10-06 14:56:39 -040010[[:alpha:]] ASCII character class
11[[:^alpha:]] negated ASCII character class
Russ Cox0a38cba2010-03-02 17:17:51 -080012\pN Unicode character class (one-letter name)
13\p{Greek} Unicode character class
14\PN negated Unicode character class (one-letter name)
15\P{Greek} negated Unicode character class
16
17Composites:
18xy «x» followed by «y»
19x|y «x» or «y» (prefer «x»)
20
21Repetitions:
22x* zero or more «x», prefer more
23x+ one or more «x», prefer more
24x? zero or one «x», prefer one
25x{n,m} «n» or «n»+1 or ... or «m» «x», prefer more
26x{n,} «n» or more «x», prefer more
27x{n} exactly «n» «x»
28x*? zero or more «x», prefer fewer
29x+? one or more «x», prefer fewer
30x?? zero or one «x», prefer zero
31x{n,m}? «n» or «n»+1 or ... or «m» «x», prefer fewer
32x{n,}? «n» or more «x», prefer fewer
33x{n}? exactly «n» «x»
34x{} (== x*) NOT SUPPORTED vim
35x{-} (== x*?) NOT SUPPORTED vim
36x{-n} (== x{n}?) NOT SUPPORTED vim
37x= (== x?) NOT SUPPORTED vim
38
Russ Coxc0e01a72014-10-06 15:08:47 -040039Implementation restriction: The counting forms «x{n,m}», «x{n,}», and «x{n}»
40reject forms that create a minimum or maximum repetition count above 1000.
41Unlimited repetitions are not subject to this restriction.
42
Russ Cox0a38cba2010-03-02 17:17:51 -080043Possessive repetitions:
44x*+ zero or more «x», possessive NOT SUPPORTED
45x++ one or more «x», possessive NOT SUPPORTED
46x?+ zero or one «x», possessive NOT SUPPORTED
47x{n,m}+ «n» or ... or «m» «x», possessive NOT SUPPORTED
48x{n,}+ «n» or more «x», possessive NOT SUPPORTED
49x{n}+ exactly «n» «x», possessive NOT SUPPORTED
50
51Grouping:
Russ Coxc0e01a72014-10-06 15:08:47 -040052(re) numbered capturing group (submatch)
53(?P<name>re) named & numbered capturing group (submatch)
54(?<name>re) named & numbered capturing group (submatch) NOT SUPPORTED
55(?'name're) named & numbered capturing group (submatch) NOT SUPPORTED
Russ Cox0a38cba2010-03-02 17:17:51 -080056(?:re) non-capturing group
Russ Cox498affb2011-12-09 11:44:13 -050057(?flags) set flags within current group; non-capturing
Russ Cox0a38cba2010-03-02 17:17:51 -080058(?flags:re) set flags during re; non-capturing
59(?#text) comment NOT SUPPORTED
60(?|x|y|z) branch numbering reset NOT SUPPORTED
61(?>re) possessive match of «re» NOT SUPPORTED
62re@> possessive match of «re» NOT SUPPORTED vim
63%(re) non-capturing group NOT SUPPORTED vim
64
Russ Coxed759d22011-12-09 11:46:03 -050065Flags:
Russ Cox0a38cba2010-03-02 17:17:51 -080066i case-insensitive (default false)
Russ Coxea892b12012-10-21 10:48:11 -040067m multi-line mode: «^» and «$» match begin/end line in addition to begin/end text (default false)
Russ Cox0a38cba2010-03-02 17:17:51 -080068s let «.» match «\n» (default false)
69U ungreedy: swap meaning of «x*» and «x*?», «x+» and «x+?», etc (default false)
70Flag syntax is «xyz» (set) or «-xyz» (clear) or «xy-z» (set «xy», clear «z»).
71
72Empty strings:
73^ at beginning of text or line («m»=true)
Russ Coxc2666892011-01-30 13:37:41 -050074$ at end of text (like «\z» not «\Z») or line («m»=true)
Russ Cox0a38cba2010-03-02 17:17:51 -080075\A at beginning of text
Russ Coxc0e01a72014-10-06 15:08:47 -040076\b at ASCII word boundary («\w» on one side and «\W», «\A», or «\z» on the other)
77\B not at ASCII word boundary
Russ Cox0a38cba2010-03-02 17:17:51 -080078\G at beginning of subtext being searched NOT SUPPORTED pcre
79\G at end of last match NOT SUPPORTED perl
80\Z at end of text, or before newline at end of text NOT SUPPORTED
81\z at end of text
82(?=re) before text matching «re» NOT SUPPORTED
83(?!re) before text not matching «re» NOT SUPPORTED
84(?<=re) after text matching «re» NOT SUPPORTED
85(?<!re) after text not matching «re» NOT SUPPORTED
86re& before text matching «re» NOT SUPPORTED vim
87re@= before text matching «re» NOT SUPPORTED vim
88re@! before text not matching «re» NOT SUPPORTED vim
89re@<= after text matching «re» NOT SUPPORTED vim
90re@<! after text not matching «re» NOT SUPPORTED vim
91\zs sets start of match (= \K) NOT SUPPORTED vim
92\ze sets end of match NOT SUPPORTED vim
93\%^ beginning of file NOT SUPPORTED vim
94\%$ end of file NOT SUPPORTED vim
95\%V on screen NOT SUPPORTED vim
96\%# cursor position NOT SUPPORTED vim
97\%'m mark «m» position NOT SUPPORTED vim
98\%23l in line 23 NOT SUPPORTED vim
99\%23c in column 23 NOT SUPPORTED vim
100\%23v in virtual column 23 NOT SUPPORTED vim
101
102Escape sequences:
103\a bell (== \007)
104\f form feed (== \014)
105\t horizontal tab (== \011)
106\n newline (== \012)
107\r carriage return (== \015)
108\v vertical tab character (== \013)
109\* literal «*», for any punctuation character «*»
110\123 octal character code (up to three digits)
111\x7F hex character code (exactly two digits)
112\x{10FFFF} hex character code
113\C match a single byte even in UTF-8 mode
114\Q...\E literal text «...» even if «...» has punctuation
115
116\1 backreference NOT SUPPORTED
117\b backspace NOT SUPPORTED (use «\010»)
118\cK control char ^K NOT SUPPORTED (use «\001» etc)
119\e escape NOT SUPPORTED (use «\033»)
120\g1 backreference NOT SUPPORTED
121\g{1} backreference NOT SUPPORTED
122\g{+1} backreference NOT SUPPORTED
123\g{-1} backreference NOT SUPPORTED
124\g{name} named backreference NOT SUPPORTED
125\g<name> subroutine call NOT SUPPORTED
126\g'name' subroutine call NOT SUPPORTED
127\k<name> named backreference NOT SUPPORTED
128\k'name' named backreference NOT SUPPORTED
129\lX lowercase «X» NOT SUPPORTED
130\ux uppercase «x» NOT SUPPORTED
131\L...\E lowercase text «...» NOT SUPPORTED
132\K reset beginning of «$0» NOT SUPPORTED
133\N{name} named Unicode character NOT SUPPORTED
134\R line break NOT SUPPORTED
135\U...\E upper case text «...» NOT SUPPORTED
136\X extended Unicode sequence NOT SUPPORTED
137
138\%d123 decimal character 123 NOT SUPPORTED vim
139\%xFF hex character FF NOT SUPPORTED vim
140\%o123 octal character 123 NOT SUPPORTED vim
141\%u1234 Unicode character 0x1234 NOT SUPPORTED vim
142\%U12345678 Unicode character 0x12345678 NOT SUPPORTED vim
143
144Character class elements:
145x single character
146A-Z character range (inclusive)
147\d Perl character class
148[:foo:] ASCII character class «foo»
149\p{Foo} Unicode character class «Foo»
150\pF Unicode character class «F» (one-letter name)
151
152Named character classes as character class elements:
153[\d] digits (== \d)
154[^\d] not digits (== \D)
155[\D] not digits (== \D)
156[^\D] not not digits (== \d)
157[[:name:]] named ASCII class inside character class (== [:name:])
158[^[:name:]] named ASCII class inside negated character class (== [:^name:])
159[\p{Name}] named Unicode property inside character class (== \p{Name})
160[^\p{Name}] named Unicode property inside negated character class (== \P{Name})
161
Russ Coxc0e01a72014-10-06 15:08:47 -0400162Perl character classes (all ASCII-only):
Russ Cox0a38cba2010-03-02 17:17:51 -0800163\d digits (== [0-9])
164\D not digits (== [^0-9])
165\s whitespace (== [\t\n\f\r ])
166\S not whitespace (== [^\t\n\f\r ])
167\w word characters (== [0-9A-Za-z_])
168\W not word characters (== [^0-9A-Za-z_])
169
170\h horizontal space NOT SUPPORTED
171\H not horizontal space NOT SUPPORTED
172\v vertical space NOT SUPPORTED
173\V not vertical space NOT SUPPORTED
174
175ASCII character classes:
Russ Coxc4f35a12014-10-06 14:56:39 -0400176[[:alnum:]] alphanumeric (== [0-9A-Za-z])
177[[:alpha:]] alphabetic (== [A-Za-z])
178[[:ascii:]] ASCII (== [\x00-\x7F])
179[[:blank:]] blank (== [\t ])
180[[:cntrl:]] control (== [\x00-\x1F\x7F])
181[[:digit:]] digits (== [0-9])
182[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
183[[:lower:]] lower case (== [a-z])
184[[:print:]] printable (== [ -~] == [ [:graph:]])
185[[:punct:]] punctuation (== [!-/:-@[-`{-~])
186[[:space:]] whitespace (== [\t\n\v\f\r ])
187[[:upper:]] upper case (== [A-Z])
188[[:word:]] word characters (== [0-9A-Za-z_])
189[[:xdigit:]] hex digit (== [0-9A-Fa-f])
Russ Cox0a38cba2010-03-02 17:17:51 -0800190
191Unicode character class names--general category:
192C other
193Cc control
194Cf format
195Cn unassigned code points NOT SUPPORTED
196Co private use
197Cs surrogate
198L letter
199LC cased letter NOT SUPPORTED
200L& cased letter NOT SUPPORTED
201Ll lowercase letter
202Lm modifier letter
203Lo other letter
204Lt titlecase letter
205Lu uppercase letter
206M mark
207Mc spacing mark
208Me enclosing mark
209Mn non-spacing mark
210N number
211Nd decimal number
212Nl letter number
213No other number
214P punctuation
215Pc connector punctuation
216Pd dash punctuation
217Pe close punctuation
218Pf final punctuation
219Pi initial punctuation
220Po other punctuation
221Ps open punctuation
222S symbol
223Sc currency symbol
224Sk modifier symbol
225Sm math symbol
226So other symbol
227Z separator
228Zl line separator
229Zp paragraph separator
230Zs space separator
231
232Unicode character class names--scripts:
233Arabic Arabic
234Armenian Armenian
235Balinese Balinese
Russ Cox58fd1b32014-01-09 21:59:48 -0500236Bamum Bamum
237Batak Batak
Russ Cox0a38cba2010-03-02 17:17:51 -0800238Bengali Bengali
239Bopomofo Bopomofo
Russ Cox58fd1b32014-01-09 21:59:48 -0500240Brahmi Brahmi
Russ Cox0a38cba2010-03-02 17:17:51 -0800241Braille Braille
242Buginese Buginese
243Buhid Buhid
244Canadian_Aboriginal Canadian Aboriginal
245Carian Carian
Russ Cox58fd1b32014-01-09 21:59:48 -0500246Chakma Chakma
Russ Cox0a38cba2010-03-02 17:17:51 -0800247Cham Cham
248Cherokee Cherokee
249Common characters not specific to one script
250Coptic Coptic
251Cuneiform Cuneiform
252Cypriot Cypriot
253Cyrillic Cyrillic
254Deseret Deseret
255Devanagari Devanagari
Russ Cox58fd1b32014-01-09 21:59:48 -0500256Egyptian_Hieroglyphs Egyptian Hieroglyphs
Russ Cox0a38cba2010-03-02 17:17:51 -0800257Ethiopic Ethiopic
258Georgian Georgian
259Glagolitic Glagolitic
260Gothic Gothic
261Greek Greek
262Gujarati Gujarati
263Gurmukhi Gurmukhi
264Han Han
265Hangul Hangul
266Hanunoo Hanunoo
267Hebrew Hebrew
268Hiragana Hiragana
Russ Cox58fd1b32014-01-09 21:59:48 -0500269Imperial_Aramaic Imperial Aramaic
Russ Cox0a38cba2010-03-02 17:17:51 -0800270Inherited inherit script from previous character
Russ Cox58fd1b32014-01-09 21:59:48 -0500271Inscriptional_Pahlavi Inscriptional Pahlavi
272Inscriptional_Parthian Inscriptional Parthian
273Javanese Javanese
274Kaithi Kaithi
Russ Cox0a38cba2010-03-02 17:17:51 -0800275Kannada Kannada
276Katakana Katakana
277Kayah_Li Kayah Li
278Kharoshthi Kharoshthi
279Khmer Khmer
280Lao Lao
281Latin Latin
282Lepcha Lepcha
283Limbu Limbu
284Linear_B Linear B
285Lycian Lycian
286Lydian Lydian
287Malayalam Malayalam
Russ Cox58fd1b32014-01-09 21:59:48 -0500288Mandaic Mandaic
289Meetei_Mayek Meetei Mayek
290Meroitic_Cursive Meroitic Cursive
291Meroitic_Hieroglyphs Meroitic Hieroglyphs
292Miao Miao
Russ Cox0a38cba2010-03-02 17:17:51 -0800293Mongolian Mongolian
294Myanmar Myanmar
295New_Tai_Lue New Tai Lue (aka Simplified Tai Lue)
296Nko Nko
297Ogham Ogham
298Ol_Chiki Ol Chiki
299Old_Italic Old Italic
300Old_Persian Old Persian
Russ Cox58fd1b32014-01-09 21:59:48 -0500301Old_South_Arabian Old South Arabian
302Old_Turkic Old Turkic
Russ Cox0a38cba2010-03-02 17:17:51 -0800303Oriya Oriya
304Osmanya Osmanya
305Phags_Pa 'Phags Pa
306Phoenician Phoenician
307Rejang Rejang
308Runic Runic
309Saurashtra Saurashtra
Russ Cox58fd1b32014-01-09 21:59:48 -0500310Sharada Sharada
Russ Cox0a38cba2010-03-02 17:17:51 -0800311Shavian Shavian
312Sinhala Sinhala
Russ Cox58fd1b32014-01-09 21:59:48 -0500313Sora_Sompeng Sora Sompeng
Russ Cox0a38cba2010-03-02 17:17:51 -0800314Sundanese Sundanese
315Syloti_Nagri Syloti Nagri
316Syriac Syriac
317Tagalog Tagalog
318Tagbanwa Tagbanwa
319Tai_Le Tai Le
Russ Cox58fd1b32014-01-09 21:59:48 -0500320Tai_Tham Tai Tham
321Tai_Viet Tai Viet
322Takri Takri
Russ Cox0a38cba2010-03-02 17:17:51 -0800323Tamil Tamil
324Telugu Telugu
325Thaana Thaana
326Thai Thai
327Tibetan Tibetan
328Tifinagh Tifinagh
329Ugaritic Ugaritic
330Vai Vai
331Yi Yi
332
333Vim character classes:
334\i identifier character NOT SUPPORTED vim
335\I «\i» except digits NOT SUPPORTED vim
336\k keyword character NOT SUPPORTED vim
337\K «\k» except digits NOT SUPPORTED vim
338\f file name character NOT SUPPORTED vim
339\F «\f» except digits NOT SUPPORTED vim
340\p printable character NOT SUPPORTED vim
341\P «\p» except digits NOT SUPPORTED vim
342\s whitespace character (== [ \t]) NOT SUPPORTED vim
343\S non-white space character (== [^ \t]) NOT SUPPORTED vim
344\d digits (== [0-9]) vim
345\D not «\d» vim
346\x hex digits (== [0-9A-Fa-f]) NOT SUPPORTED vim
347\X not «\x» NOT SUPPORTED vim
348\o octal digits (== [0-7]) NOT SUPPORTED vim
349\O not «\o» NOT SUPPORTED vim
350\w word character vim
351\W not «\w» vim
352\h head of word character NOT SUPPORTED vim
353\H not «\h» NOT SUPPORTED vim
354\a alphabetic NOT SUPPORTED vim
355\A not «\a» NOT SUPPORTED vim
356\l lowercase NOT SUPPORTED vim
357\L not lowercase NOT SUPPORTED vim
358\u uppercase NOT SUPPORTED vim
359\U not uppercase NOT SUPPORTED vim
360\_x «\x» plus newline, for any «x» NOT SUPPORTED vim
361
362Vim flags:
363\c ignore case NOT SUPPORTED vim
364\C match case NOT SUPPORTED vim
365\m magic NOT SUPPORTED vim
366\M nomagic NOT SUPPORTED vim
367\v verymagic NOT SUPPORTED vim
368\V verynomagic NOT SUPPORTED vim
369\Z ignore differences in Unicode combining characters NOT SUPPORTED vim
370
371Magic:
372(?{code}) arbitrary Perl code NOT SUPPORTED perl
373(??{code}) postponed arbitrary Perl code NOT SUPPORTED perl
374(?n) recursive call to regexp capturing group «n» NOT SUPPORTED
375(?+n) recursive call to relative group «+n» NOT SUPPORTED
376(?-n) recursive call to relative group «-n» NOT SUPPORTED
377(?C) PCRE callout NOT SUPPORTED pcre
378(?R) recursive call to entire regexp (== (?0)) NOT SUPPORTED
379(?&name) recursive call to named group NOT SUPPORTED
380(?P=name) named backreference NOT SUPPORTED
381(?P>name) recursive call to named group NOT SUPPORTED
382(?(cond)true|false) conditional branch NOT SUPPORTED
383(?(cond)true) conditional branch NOT SUPPORTED
384(*ACCEPT) make regexps more like Prolog NOT SUPPORTED
385(*COMMIT) NOT SUPPORTED
386(*F) NOT SUPPORTED
387(*FAIL) NOT SUPPORTED
388(*MARK) NOT SUPPORTED
389(*PRUNE) NOT SUPPORTED
390(*SKIP) NOT SUPPORTED
391(*THEN) NOT SUPPORTED
392(*ANY) set newline convention NOT SUPPORTED
393(*ANYCRLF) NOT SUPPORTED
394(*CR) NOT SUPPORTED
395(*CRLF) NOT SUPPORTED
396(*LF) NOT SUPPORTED
397(*BSR_ANYCRLF) set \R convention NOT SUPPORTED pcre
398(*BSR_UNICODE) NOT SUPPORTED pcre
399