RegexBuddy—Better than a regular expression reference!

Regular Expression Unicode Character and Property Reference

This reference page explains what the Unicode tokens do when used outside character classes. All of these except \X can also be used inside character classes. Inside a character class, these tokens add the characters that they normally match to the character class.

The word Property in the Syntax column in the table below needs to be substituted with one of the Unicode properties that you can find in the references pages for which categories, scripts, blocks, binary properties, or one value in a property set.

Feature	Syntax	Description	Example	JGsoft	Python	JavaScript	VBScript	XRegExp	.NET	Java	ICU	RE2	Perl	PCRE	PCRE2	PHP	Delphi	R	Ruby	std::regex	Boost	Tcl	POSIX	GNU	Oracle	XML	XPath
Unicode property	`\p{Property}`	Matches a single Unicode code point that does have the specified property.	`\p{L}` matches `a`	YES	no	with /u	no	YES	YES	YES	YES	default	YES	5.0	YES	YES	YES	YES	1.9	no	ECMA extended egrep awk	no	no	no	no	YES	YES
Negated Unicode property	`\P{Property}`	Matches a single Unicode code point that does not have the specified property.	`\P{L}` matches `©`	YES	no	with /u	no	YES	YES	YES	YES	default	YES	5.0	YES	YES	YES	YES	1.9	no	ECMA extended egrep awk	no	no	no	no	YES	YES
Negated Unicode property	`\p{^Property}`	Matches a single Unicode code point that does not have the specified property.	`\p{^L}` matches `©`	YES	no	no	no	YES	no	no	no	default	YES	5.0	YES	YES	YES	YES	1.9	no	no	no	no	no	no	no	no
Unicode property	`\P{^Property}`	Matches a single Unicode code point that does have the specified property. Double negative is taken as positive.	`\P{^L}` matches `q`	V2	no	no	no	no	no	no	no	default	YES	5.0	YES	YES	YES	YES	1.9	no	no	no	no	no	no	no	no
Code point	`\uFFFF` where FFFF are 4 hexadecimal digits	Matches a specific Unicode code point.	`\u00E0` matches `à` encoded as U+00E0 only. `\u00A9` matches `©`	YES	3.3 2.4 string	YES	YES	YES	YES	YES	YES	no	no	no	no	no	no	string	1.9	ECMA	no	YES	no	no	no	no	no
Code point	`\u{FFFF}` where FFFF are 1 to 4 hexadecimal digits	Matches a specific Unicode code point.	`\u{E0}` matches `à` encoded as U+00E0 only. `\u{A9}` matches `©`	V2	no	with /u	no	YES	no	no	no	no	no	no	no	7.0.0 string	no	string	1.9	no	no	no	no	no	no	no	no
Code point	`\u{10FFFF}` where 10FFFF is a hexadecimal value between 0 and 10FFFF	Matches a specific Unicode code point.	`\u{1D400}` matches `𝐀`	no	no	with /u	no	string	no	no	no	no	no	no	no	7.0.0 string	no	no	1.9	no	no	no	no	no	no	no	no
Code point	`\xFFFF` where FFFF are 4 hexadecimal digits	Matches a specific Unicode code point.	`\x00E0` matches `à` encoded as U+00E0 only. `\x00A9` matches `©`	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	string	no	8.4–8.5	no	no	no	no	no
Code point	`\x{FFFF}` where FFFF are 1 to 4 hexadecimal digits	Matches a specific Unicode code point.	`\x{E0}` matches `à` encoded as U+00E0 only. `\x{A9}` matches `©`	YES	no	no	no	no	no	7	YES	YES	YES	YES	YES	YES	YES	YES	no	no	ECMA extended egrep awk	no	no	no	no	no	no
Code point	`\x{10FFFF}` where 10FFFF is a hexadecimal value between 0 and 10FFFF	Matches a specific Unicode code point.	`\x{1D400}` matches `𝐀`	error	no	no	no	no	no	7	YES	YES	YES	YES	YES	YES	YES	YES	no	no	ECMA extended egrep awk 1.42 error	no	no	no	no	no	no
Code point	`\U0010FFFF` where 0010FFFF are 8 hexadecimal digits between 00000000 and 0010FFFF	Matches a specific Unicode code point.	`\U0001D400` matches `𝐀`	no	3.3 2.4 string	no	no	no	no	no	YES	no	no	no	string	no	no	string	no	string	no	YES	no	no	no	no	no
Code point	`\U10FFFF` where 10FFFF are 1 to 7 hexadecimal digits between 0 and 010FFFF	Matches a specific Unicode code point.	`\U1D400`, `\U01D400`, and `\U001D400` match `𝐀`	no	no	no	no	no	no	no	no	no	no	no	no	no	no	string	no	no	no	8.6	no	no	no	no	no
Grapheme	`\X`	Matches an extended grapheme cluster break according to UAX 29.	`\X` matches `à` (U+0061 U+0300), `คู` (U+0E0F U+0E39), `अः` (U+0905 U+0903), and `ｶﾞ` (U+FF76 U+FF9F).	no	no	no	no	no	no	9	67	no	YES	8.32	YES	YES	XE7	2.15.3	2.4	no	no	no	no	no	no	no	no
Grapheme	`\X`	Matches a legacy grapheme cluster break according to UAX 29.	`\X` matches `à` (U+0061 U+0300), `คู` (U+0E0F U+0E39), and `ｶﾞ` (U+FF76 U+FF9F), whereas `\X\X` matches `अः` (U+0905 U+0903).	no	no	no	no	no	no	no	55–66	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no
Grapheme	`\X`	Matches a single character that is not in the Mark category and all following characters that are in the Mark category (if any).	`\X` matches `à` (U+0061 U+0300), and `คู` (U+0E0F U+0E39), but matches only `ｶ` (U+FF76) in `ｶﾞ` (U+FF76 U+FF9F).	YES	no	no	no	no	no	no	no	no	no	5.0–8.31	no	no	XE–XE6	2.14.0–2.15.2	2.0–2.3	no	no	no	no	no	no	no	no
Grapheme	`\X`	Matches any single character and all following characters that Boost treats as combining characters.	`\X` matches `à` (U+0061 U+0300), whereas `\X\X` matches `คู` (U+0E0F U+0E39), `अः` (U+0905 U+0903), and `ｶﾞ` (U+FF76 U+FF9F).	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	ECMA extended egrep awk	no	no	no	no	no	no
Feature	Syntax	Description	Example	JGsoft	Python	JavaScript	VBScript	XRegExp	.NET	Java	ICU	RE2	Perl	PCRE	PCRE2	PHP	Delphi	R	Ruby	std::regex	Boost	Tcl	POSIX	GNU	Oracle	XML	XPath