Quick Start
Tutorial
Search & Replace
Tools & Languages
Examples
Reference
Regex Reference
Introduction
Table of Contents
Quick Reference
Characters
Basic Features
Character Classes
Shorthands
Anchors
Word Boundaries
Quantifiers
Capturing Groups & Backreferences
Named Groups & Backreferences
Special Groups
Unicode Characters and Properties
Unicode Versions
Unicode Categories
Unicode Scripts
Unicode Blocks
Unicode Binary Properties
Unicode Property Sets
Unicode Boundaries
Mode Modifiers
Recursion & Balancing Groups
Backtracking Control Verbs
Replacement Reference
Characters
Matched Text & Backreferences
Case Conversion
Context
Conditionals
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog
RegexBuddy—Better than a regular expression reference!

Regular Expression Unicode Character and Property Reference

This reference page explains what the Unicode tokens do when used outside character classes. All of these except \X can also be used inside character classes. Inside a character class, these tokens add the characters that they normally match to the character class.

The word Property in the Syntax column in the table below needs to be substituted with one of the Unicode properties that you can find in the references pages for which categories, scripts, blocks, binary properties, or one value in a property set.

FeatureSyntaxDescriptionExampleJGsoft Python JavaScript VBScript XRegExp .NET Java ICU RE2 Perl PCRE PCRE2 PHP Delphi R Ruby std::regex Boost Tcl POSIX GNU Oracle XML XPath
Unicode property \p{Property} Matches a single Unicode code point that does have the specified property. \p{L} matches a YESnowith /unoYESYESYESYESdefaultYES5.0YESYESYESYES1.9noECMA
extended
egrep
awk
nonononoYESYES
Negated Unicode property \P{Property} Matches a single Unicode code point that does not have the specified property. \P{L} matches © YESnowith /unoYESYESYESYESdefaultYES5.0YESYESYESYES1.9noECMA
extended
egrep
awk
nonononoYESYES
Negated Unicode property \p{^Property} Matches a single Unicode code point that does not have the specified property. \p{^L} matches © YESnononoYESnononodefaultYES5.0YESYESYESYES1.9nononononononono
Unicode property \P{^Property} Matches a single Unicode code point that does have the specified property. Double negative is taken as positive. \P{^L} matches q V2nononononononodefaultYES5.0YESYESYESYES1.9nononononononono
Code point \uFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. \u00E0 matches à encoded as U+00E0 only. \u00A9 matches © YES3.3
2.4 string
YESYESYESYESYESYESnonononononostring1.9ECMAnoYESnonononono
Code point \u{FFFF} where FFFF are 1 to 4 hexadecimal digits Matches a specific Unicode code point. \u{E0} matches à encoded as U+00E0 only. \u{A9} matches © V2nowith /unoYESnonononononono7.0.0 stringnostring1.9nononononononono
Code point \u{10FFFF} where 10FFFF is a hexadecimal value between 0 and 10FFFF Matches a specific Unicode code point. \u{1D400} matches 𝐀 nonowith /unostringnonononononono7.0.0 stringnono1.9nononononononono
Code point \xFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. \x00E0 matches à encoded as U+00E0 only. \x00A9 matches © nonononononononononononononononostringno8.4–8.5nonononono
Code point \x{FFFF} where FFFF are 1 to 4 hexadecimal digits Matches a specific Unicode code point. \x{E0} matches à encoded as U+00E0 only. \x{A9} matches © YESnonononono7YESYESYESYESYESYESYESYESnonoECMA
extended
egrep
awk
nononononono
Code point \x{10FFFF} where 10FFFF is a hexadecimal value between 0 and 10FFFF Matches a specific Unicode code point. \x{1D400} matches 𝐀 errornonononono7YESYESYESYESYESYESYESYESnonoECMA
extended
egrep
awk
1.42
error
nononononono
Code point \U0010FFFF where 0010FFFF are 8 hexadecimal digits between 00000000 and 0010FFFF Matches a specific Unicode code point. \U0001D400 matches 𝐀 no3.3
2.4 string
nononononoYESnononostringnonostringnostringnoYESnonononono
Code point \U10FFFF where 10FFFF are 1 to 7 hexadecimal digits between 0 and 010FFFF Matches a specific Unicode code point. \U1D400, \U01D400, and \U001D400 match 𝐀 nonononononononononononononostringnonono8.6nonononono
Grapheme \X Matches an extended grapheme cluster break according to UAX 29. \X matches à (U+0061 U+0300), คู (U+0E0F U+0E39), अः (U+0905 U+0903), and ガ (U+FF76 U+FF9F). nononononono967noYES8.32YESYESXE72.15.32.4nononononononono
Grapheme \X Matches a legacy grapheme cluster break according to UAX 29. \X matches à (U+0061 U+0300), คู (U+0E0F U+0E39), and ガ (U+FF76 U+FF9F), whereas \X\X matches अः (U+0905 U+0903). nonononononono55–66nononononononononononononononono
Grapheme \X Matches a single character that is not in the Mark category and all following characters that are in the Mark category (if any). \X matches à (U+0061 U+0300), and คู (U+0E0F U+0E39), but matches only (U+FF76) in ガ (U+FF76 U+FF9F). YESnonononononononono5.0–8.31nonoXE–XE62.14.0–2.15.22.0–2.3nononononononono
Grapheme \X Matches any single character and all following characters that Boost treats as combining characters. \X matches à (U+0061 U+0300), whereas \X\X matches คู (U+0E0F U+0E39), अः (U+0905 U+0903), and ガ (U+FF76 U+FF9F). nononononononononononononononononoECMA
extended
egrep
awk
nononononono
FeatureSyntaxDescriptionExampleJGsoft Python JavaScript VBScript XRegExp .NET Java ICU RE2 Perl PCRE PCRE2 PHP Delphi R Ruby std::regex Boost Tcl POSIX GNU Oracle XML XPath