Replacement Reference |
Characters |
Matched Text & Backreferences |
Case Conversion |
Context |
Conditionals |
This reference page explains what the Unicode tokens do when used outside character classes. All of these except \X can also be used inside character classes. Inside a character class, these tokens add the characters that they normally match to the character class.
The word Property in the Syntax column in the table below needs to be substituted with one of the Unicode properties that you can find in the references pages for which categories, scripts, blocks, binary properties, or one value in a property set.
Feature | Syntax | Description | Example | JGsoft | Python | JavaScript | VBScript | XRegExp | .NET | Java | ICU | RE2 | Perl | PCRE | PCRE2 | PHP | Delphi | R | Ruby | std::regex | Boost | Tcl | POSIX | GNU | Oracle | XML | XPath |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Unicode property | \p{Property} | Matches a single Unicode code point that does have the specified property. | \p{L} matches a | YES | no | with /u | no | YES | YES | YES | YES | default | YES | 5.0 | YES | YES | YES | YES | 1.9 | no | ECMA extended egrep awk | no | no | no | no | YES | YES |
Negated Unicode property | \P{Property} | Matches a single Unicode code point that does not have the specified property. | \P{L} matches © | YES | no | with /u | no | YES | YES | YES | YES | default | YES | 5.0 | YES | YES | YES | YES | 1.9 | no | ECMA extended egrep awk | no | no | no | no | YES | YES |
Negated Unicode property | \p{^Property} | Matches a single Unicode code point that does not have the specified property. | \p{^L} matches © | YES | no | no | no | YES | no | no | no | default | YES | 5.0 | YES | YES | YES | YES | 1.9 | no | no | no | no | no | no | no | no |
Unicode property | \P{^Property} | Matches a single Unicode code point that does have the specified property. Double negative is taken as positive. | \P{^L} matches q | V2 | no | no | no | no | no | no | no | default | YES | 5.0 | YES | YES | YES | YES | 1.9 | no | no | no | no | no | no | no | no |
Code point | \uFFFF where FFFF are 4 hexadecimal digits | Matches a specific Unicode code point. | \u00E0 matches à encoded as U+00E0 only. \u00A9 matches © | YES | 3.3 2.4 string | YES | YES | YES | YES | YES | YES | no | no | no | no | no | no | string | 1.9 | ECMA | no | YES | no | no | no | no | no |
Code point | \u{FFFF} where FFFF are 1 to 4 hexadecimal digits | Matches a specific Unicode code point. | \u{E0} matches à encoded as U+00E0 only. \u{A9} matches © | V2 | no | with /u | no | YES | no | no | no | no | no | no | no | 7.0.0 string | no | string | 1.9 | no | no | no | no | no | no | no | no |
Code point | \u{10FFFF} where 10FFFF is a hexadecimal value between 0 and 10FFFF | Matches a specific Unicode code point. | \u{1D400} matches 𝐀 | no | no | with /u | no | string | no | no | no | no | no | no | no | 7.0.0 string | no | no | 1.9 | no | no | no | no | no | no | no | no |
Code point | \xFFFF where FFFF are 4 hexadecimal digits | Matches a specific Unicode code point. | \x00E0 matches à encoded as U+00E0 only. \x00A9 matches © | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | string | no | no | no | no | no | no | |
Code point | \x{FFFF} where FFFF are 1 to 4 hexadecimal digits | Matches a specific Unicode code point. | \x{E0} matches à encoded as U+00E0 only. \x{A9} matches © | YES | no | no | no | no | no | 7 | YES | YES | YES | YES | YES | YES | YES | YES | no | no | ECMA extended egrep awk | no | no | no | no | no | no |
Code point | \x{10FFFF} where 10FFFF is a hexadecimal value between 0 and 10FFFF | Matches a specific Unicode code point. | \x{1D400} matches 𝐀 | error | no | no | no | no | no | 7 | YES | YES | YES | YES | YES | YES | YES | YES | no | no | ECMA extended egrep awk 1.42 error | no | no | no | no | no | no |
Code point | \U0010FFFF where 0010FFFF are 8 hexadecimal digits between 00000000 and 0010FFFF | Matches a specific Unicode code point. | \U0001D400 matches 𝐀 | no | 3.3 2.4 string | no | no | no | no | no | YES | no | no | no | string | no | no | string | no | string | no | YES | no | no | no | no | no |
Code point | \U10FFFF where 10FFFF are 1 to 7 hexadecimal digits between 0 and 010FFFF | Matches a specific Unicode code point. | \U1D400, \U01D400, and \U001D400 match 𝐀 | no | no | no | no | no | no | no | no | no | no | no | no | no | no | string | no | no | no | 8.6 | no | no | no | no | no |
Grapheme | \X | Matches an extended grapheme cluster break according to UAX 29. | \X matches | no | no | no | no | no | no | 9 | 67 | no | YES | 8.32 | YES | YES | XE7 | 2.15.3 | 2.4 | no | no | no | no | no | no | no | no |
Grapheme | \X | Matches a legacy grapheme cluster break according to UAX 29. | \X matches | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | |
Grapheme | \X | Matches a single character that is not in the Mark category and all following characters that are in the Mark category (if any). | \X matches | YES | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | ||||
Grapheme | \X | Matches any single character and all following characters that Boost treats as combining characters. | \X matches | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | ECMA extended egrep awk | no | no | no | no | no | no |
Feature | Syntax | Description | Example | JGsoft | Python | JavaScript | VBScript | XRegExp | .NET | Java | ICU | RE2 | Perl | PCRE | PCRE2 | PHP | Delphi | R | Ruby | std::regex | Boost | Tcl | POSIX | GNU | Oracle | XML | XPath |
| Quick Start | Tutorial | Search & Replace | Tools & Languages | Examples | Reference |
| Introduction | Table of Contents | Quick Reference | Characters | Basic Features | Character Classes | Shorthands | Anchors | Word Boundaries | Quantifiers | Capturing Groups & Backreferences | Named Groups & Backreferences | Special Groups | Unicode Characters and Properties | Unicode Versions | Unicode Categories | Unicode Scripts | Unicode Blocks | Unicode Binary Properties | Unicode Property Sets | Unicode Boundaries | Mode Modifiers | Recursion & Balancing Groups | Backtracking Control Verbs |
| Characters | Matched Text & Backreferences | Case Conversion | Context | Conditionals |
Page URL: https://www.regular-expressions.info/refunicode.html
Page last updated: 18 August 2025
Site last updated: 14 October 2025
Copyright © 2003-2025 Jan Goyvaerts. All rights reserved.