Regular Expressions Basics
What and Why
Common search versus pattern search
Common replace versus pattern replace
Regular expressions (also referred as regexp or regex) give you power to search, match, extract and replace string of text
Flexible patterns
Simple patterns Advanced expressions
Available in most languages, scripts, editing tools, shells
Origins from UNIX editor ed / grep Different RegExp engines, not fully compatible Perl is most popular
Basic Syntax
Regular expressions are Strings themselves: "1 word"
RegExp are case-sensitive, if not specified otherwise
JavaScript accepts also non-string expressions: /^[a-z]+$/gi Expressions have symbols: \n\b([{}])+?*
Symbols need to be escaped when used as plain characters: \{
Symbols can be grouped in types: position, literal, character classes, repetition, alternation and grouping, back references, pattern switches
Position Symbols
^ Matches the beginning of a string $ Matches the end of a string \b Matches a word boundary \B Matches a non-word boundary
Literal Symbols
alphanumeric All alphabetical and numeric characters match themselves (e.g. /2 apples/) \n New line \r Carriage return \t Horizontal tab \f Form feed \v Vertical tab \xdd Hex numbers \uXXXX Unicode representation of characters \ Escape special characters used as symbols: \$
Character Classes
[] delimit a character class Accepts ranges with dash: [a-f] [2-5]
Can be negated: [^abc] . (any) Matches any character except new line terminators \w (word) Matches alphanumeric equivalent to [a-zA-Z0-9_] \W (non-word) Matches non-word characters equivalent to
\d Matches any digit equivalent to [0-9] \D Matches any non-digit equivalent to [^0-9] \s Matches any white space equivalent to [ \t\r\n\v\f] \S Matches any non-space equivalent to [^ \t\r\n\v\f]
[^a-zA-Z0-9_]
Repetition and Quantification
Repetition symbols follow other symbols or patterns {x} Matches exactly x occurrences {x,} Matches x or more occurrences {x,y} Matches x to y (inclusive) occurrences * Matches 0 or more occurrences equivalent to {0,} + Matches 1 or more occurrences equivalent to {1,} ? Matches 0 or 1 occurrences equivalent to {0,1} [a-z]+ [A-Z]* L?evi
Examples: \w{3} \d{6,9}
Alternation, Grouping and Backreferences
() are used to group characters toghether: (hubba\s)+ | is used as OR operator to define an alternation:
(ab)|(cd)|(de) (a|c) \n where n is 0 to 9, matches a previous group counted from left (\w+)\s+(\d+)\s+\2\1
Groups are also useful for extracting matched portions of a pattern
Pattern Switches (Flags)
i : ignore case make the expression case-insensitive g : global search searches for all occurrences of the pattern not just the first m : multiline changes the meaning of ^ and $ symbols between matching beginning and end of line and matching beginning and end of string
Pattern switches are used as parameters when constructing a RegExp object or at the end of the literal expression: new RegExp("JavaScript", "gi") /JavaScript/gi
JavaScript Usage
String methods: match( RegExp ): Array replace( RegExp, String ): String split( RegExp ): Array search( RegExp ): Number
RegExp methods: test( String ): Boolean exec( String ): Array
Examples and Practice