REGULAR EXPRESSIONS intro to regex
Pattern Matching Powerful and widely applicable technique used across various programming languages. Everything can be written using characters ⇒ use regex to search for a pattern of characters! Uses Cases: ● validate user input (emails, passwords) ● search text/code (vscode), replace/rename ● query databases ● extracting info from text (incl. web scraping) ● data massaging from raw ● File Renaming Web directives (Apache) ● interact with the Unix shell ● refactor code Why it can be intimidating:
Regex Basics / open / close g after regex to make the search global (don’t stop!) i after regex to make the search case-insensitive m after regex to perform multiline matching /unicorn/ literal exact match escape special characters! ^.[$()|*+?{ tnr click me! 3/regex/gimflags
. any character except newline d digit character D non-digit character w “word” character (alphanumeric or _) W non-word character s whitespace character (space, tab, newline, carriage return...) S not whitespace character Metacharacters
Anchors — ^text$ ^ line start with (multiline flag) $ line end with /^The end$/gim ^The matches any string that starts with The end$ matches a string that ends with end ^The end$ starts and ends with The end Boundaries — b b word boundary B not a word
Quantifiers — *, +, ? and {} * 0+ times (optional) + 1+ times (1 required) ? 0-1 times (optional) *lazy {#} # number of times {2,} 2+ times {2,5} 2-5 times 0(?=abc) match 0 only if followed by “abc” vs 0(?!abc) (?<=abc)0 match 0 only if preceded by “abc” vs (?<!abc)0 /hel{2}o/i; hello hello Helo helllo /hel{2,4}o/i; hello hellllo Helo helllo /hel{2,}o/i; (2+) hello helllllo helo helo
[A-Z] uppercase characters (in range) a-z [a-z] lowercase characters (in range) a-z [A-Za-z] any letter [aeiou] either a, e, i, o or u [0-9] match any digits 0-9 (specific numbers [1-3] vs d) Character Sets — [] /gr[ae]y/i ⇒ Third character must be “a” or “e” [^aeiou] find any character not a, e, i, o or u [^2-4] match any digits not 2, 3, 4
Grouping — () a(bc){2,5} Parens create a capturing group of “bc” ⇒ matches “a” followed by 2-5 “bc”s | or ⇒ (x|y)==[xy] both matches x or y (?:ab) match “ab” but don’t remember (“capture”) it (demo|example)[0-9]+ demo1 demoexample2 example4 demo 9 ([0-9]x){3} 3x3x3x 3x3x 4x4x4x 3x3x3x3x
Common Patterns URL (https?://)(www.)?(?<domain>[-a-zA-Z0-9@:%._+~#=]{2,256}.[a- z]{2,6})(?<path>/[-a-zA-Z0-9@:%_/+.~#?&=]*)? CREDIT CARDS '/^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6011[0-9]{12}|622(( 12[6-9]|1[3-9][0-9])|([2-8][0-9][0-9])|(9(([0-1][0-9])|(2[0-5]))) )[0-9]{10}|64[4-9][0-9]{13}|65[0-9]{14}|3(?:0[0-5]|[68][0-9])[0-9 ]{11}|3[47][0-9]{13})*$/' USERNAMES /^[a-z0-9_-]{3,16}$/ PASSWORDS /^[a-z0-9_-]{6,18}$/ IP ADDRESSES /^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0 -4][0-9]|[01]?[0-9][0-9]?)$/ function isValidEmail (input) { const regex = /^[^@s]+@[^@s]+.w{2,6}$/g; const result = regex.exec(input) return !!result } const tests = [ `test.test@gmail.com`, // Valid `test.test`, // Invalid '@invalid@test.com', // Invalid `this is a test@test.com`, // Invalid ] console.log(tests.map(isValidEmail))
Regex in Javascript Demo <Codepen> exec() returns an array with the matched text (or null) and any captured text test() returns true/false if there is a match match() returns result array (or null) search() returns the index of the first match (or -1 if not found) replace() returns new string where matches of a pattern are replaced split() returns an array of substrings broken using the regex/string

Regex - Regular Expression Basics

  • 1.
  • 2.
    Pattern Matching Powerful andwidely applicable technique used across various programming languages. Everything can be written using characters ⇒ use regex to search for a pattern of characters! Uses Cases: ● validate user input (emails, passwords) ● search text/code (vscode), replace/rename ● query databases ● extracting info from text (incl. web scraping) ● data massaging from raw ● File Renaming Web directives (Apache) ● interact with the Unix shell ● refactor code Why it can be intimidating:
  • 3.
    Regex Basics / open/ close g after regex to make the search global (don’t stop!) i after regex to make the search case-insensitive m after regex to perform multiline matching /unicorn/ literal exact match escape special characters! ^.[$()|*+?{ tnr click me! 3/regex/gimflags
  • 4.
    . any characterexcept newline d digit character D non-digit character w “word” character (alphanumeric or _) W non-word character s whitespace character (space, tab, newline, carriage return...) S not whitespace character Metacharacters
  • 5.
    Anchors — ^text$ ^line start with (multiline flag) $ line end with /^The end$/gim ^The matches any string that starts with The end$ matches a string that ends with end ^The end$ starts and ends with The end Boundaries — b b word boundary B not a word
  • 6.
    Quantifiers — *,+, ? and {} * 0+ times (optional) + 1+ times (1 required) ? 0-1 times (optional) *lazy {#} # number of times {2,} 2+ times {2,5} 2-5 times 0(?=abc) match 0 only if followed by “abc” vs 0(?!abc) (?<=abc)0 match 0 only if preceded by “abc” vs (?<!abc)0 /hel{2}o/i; hello hello Helo helllo /hel{2,4}o/i; hello hellllo Helo helllo /hel{2,}o/i; (2+) hello helllllo helo helo
  • 7.
    [A-Z] uppercase characters(in range) a-z [a-z] lowercase characters (in range) a-z [A-Za-z] any letter [aeiou] either a, e, i, o or u [0-9] match any digits 0-9 (specific numbers [1-3] vs d) Character Sets — [] /gr[ae]y/i ⇒ Third character must be “a” or “e” [^aeiou] find any character not a, e, i, o or u [^2-4] match any digits not 2, 3, 4
  • 8.
    Grouping — () a(bc){2,5}Parens create a capturing group of “bc” ⇒ matches “a” followed by 2-5 “bc”s | or ⇒ (x|y)==[xy] both matches x or y (?:ab) match “ab” but don’t remember (“capture”) it (demo|example)[0-9]+ demo1 demoexample2 example4 demo 9 ([0-9]x){3} 3x3x3x 3x3x 4x4x4x 3x3x3x3x
  • 9.
    Common Patterns URL (https?://)(www.)?(?<domain>[-a-zA-Z0-9@:%._+~#=]{2,256}.[a- z]{2,6})(?<path>/[-a-zA-Z0-9@:%_/+.~#?&=]*)? CREDIT CARDS '/^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6011[0-9]{12}|622(( 12[6-9]|1[3-9][0-9])|([2-8][0-9][0-9])|(9(([0-1][0-9])|(2[0-5]))) )[0-9]{10}|64[4-9][0-9]{13}|65[0-9]{14}|3(?:0[0-5]|[68][0-9])[0-9 ]{11}|3[47][0-9]{13})*$/' USERNAMES /^[a-z0-9_-]{3,16}$/ PASSWORDS /^[a-z0-9_-]{6,18}$/ IPADDRESSES /^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0 -4][0-9]|[01]?[0-9][0-9]?)$/ function isValidEmail (input) { const regex = /^[^@s]+@[^@s]+.w{2,6}$/g; const result = regex.exec(input) return !!result } const tests = [ `test.test@gmail.com`, // Valid `test.test`, // Invalid '@invalid@test.com', // Invalid `this is a test@test.com`, // Invalid ] console.log(tests.map(isValidEmail))
  • 10.
    Regex in JavascriptDemo <Codepen> exec() returns an array with the matched text (or null) and any captured text test() returns true/false if there is a match match() returns result array (or null) search() returns the index of the first match (or -1 if not found) replace() returns new string where matches of a pattern are replaced split() returns an array of substrings broken using the regex/string