Introduction to regular expressions REGULAR EX P RES S ION S IN P YTH ON Maria Eugenia Inzaugarat Data Scientist
REGULAR EXPRESSIONSIN PYTHON What is a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text
REGULAR EXPRESSIONSIN PYTHON What is a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Normal characters match themselves ( st )
REGULAR EXPRESSIONSIN PYTHON What is a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Metacharacters represent types of characters ( d , s , w ) or ideas ( {3,10} )
REGULAR EXPRESSIONSIN PYTHON What is a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Metacharacters represent types of characters ( d , s , w ) or ideas ( {3,10} )
REGULAR EXPRESSIONSIN PYTHON What is a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Metacharacters represent types of characters ( d , s , w ) or ideas ( {3,10} )
REGULAR EXPRESSIONSIN PYTHON What is a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Metacharacters represent types of characters ( d , s , w ) or ideas ( {3,10} )
REGULAR EXPRESSIONSIN PYTHON What is a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Pattern: a sequence of characters that maps to words or punctuation
REGULAR EXPRESSIONSIN PYTHON What is a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Pattern matching usage: Find and replace text Validate strings Very powerful and fast
REGULAR EXPRESSIONSIN PYTHON The re module import re Find all matches of a pattern: re.findall(r"#movies", "Love #movies! I had fun yesterday going to the #movies") ['#movies', '#movies']
REGULAR EXPRESSIONSIN PYTHON The re module import re Split string at each match: re.split(r"!", "Nice Place to eat! I'll come back! Excellent meat!") ['Nice Place to eat', " I'll come back", ' Excellent meat', '']
REGULAR EXPRESSIONSIN PYTHON The re module import re Replace one or many matches with a string: re.sub(r"yellow", "nice", "I have a yellow car and a yellow house in a yellow neighborhood") 'I have a nice car and a nice house in a nice neighborhood'
REGULAR EXPRESSIONSIN PYTHON Supported metacharacters re.findall(r"Userd", "The winners are: User9, UserN, User8") ['User9', 'User8'] re.findall(r"UserD", "The winners are: User9, UserN, User8") ['UserN']
REGULAR EXPRESSIONSIN PYTHON Supported metacharacters re.findall(r"Userw", "The winners are: User9, UserN, User8") ['User9', 'UserN', 'User8'] re.findall(r"Wd", "This skirt is on sale, only $5 today!") ['$5']
REGULAR EXPRESSIONSIN PYTHON Supported metacharacters re.findall(r"DatasScience", "I enjoy learning Data Science") ['Data Science'] re.sub(r"iceScream", "ice cream", "I really like ice-cream") 'I really like ice cream'
Let's practice! REGULAR EX P RES S ION S IN P YTH ON
Repetitions REGULAR EX P RES S ION S IN P YTH ON Maria Eugenia Inzaugarat Data Science
REGULAR EXPRESSIONSIN PYTHON Repeated characters Validate the following string:
REGULAR EXPRESSIONSIN PYTHON Repeated characters Validate the following string:
REGULAR EXPRESSIONSIN PYTHON Repeated characters Validate the following string:
REGULAR EXPRESSIONSIN PYTHON Repeated characters Validate the following string: import re password = "password1234" re.search(r"wwwwwwwwdddd", password) <_sre.SRE_Match object; span=(0, 12), match='password1234'>
REGULAR EXPRESSIONSIN PYTHON Repeated characters Validate the following string: import re password = "password1234" re.search(r"w{8}d{4}", password) <_sre.SRE_Match object; span=(0, 12), match='password1234'> Quanti ers: A metacharacter that tells the regex engine how many times to match a character immediately to its left.
REGULAR EXPRESSIONSIN PYTHON Quanti ers Once or more: + text = "Date of start: 4-3. Date of registration: 10-04." re.findall(r" ", text)
REGULAR EXPRESSIONSIN PYTHON Quanti ers Once or more: + text = "Date of start: 4-3. Date of registration: 10-04." re.findall(r"d+- ", text)
REGULAR EXPRESSIONSIN PYTHON Quanti ers Once or more: + text = "Date of start: 4-3. Date of registration: 10-04." re.findall(r"d+-d+", text) ['4-3', '10-04']
REGULAR EXPRESSIONSIN PYTHON Quanti ers Zero times or more: * my_string = "The concert was amazing! @ameli!a @joh&&n @mary90" re.findall(r"@w+W*w+", my_string) ['@ameli!a', '@joh&&n', '@mary90']
REGULAR EXPRESSIONSIN PYTHON Quanti ers Zero times or once: ? text = "The color of this image is amazing. However, the colour blue could be brighter." re.findall(r"colou?r", text) ['color', 'colour']
REGULAR EXPRESSIONSIN PYTHON Quanti ers n times at least, m times at most : {n, m} phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424" re.findall(r" ", phone_number)
REGULAR EXPRESSIONSIN PYTHON Quanti ers n times at least, m times at most : {n, m} phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424" re.findall(r"d{1,2}- ", phone_number)
REGULAR EXPRESSIONSIN PYTHON Quanti ers n times at least, m times at most : {n, m} phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424" re.findall(r"d{1,2}-d{3}- ", phone_number)
REGULAR EXPRESSIONSIN PYTHON Quanti ers n times at least, m times at most : {n, m} phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424" re.findall(r"d{1,2}-d{3}-d{2,3}-d{4,}", phone_number) ['1-966-847-3131', '54-908-42-42424']
REGULAR EXPRESSIONSIN PYTHON Quanti ers Immediately to the left r"apple+" : + applies to e and not to apple
Let's practice! REGULAR EX P RES S ION S IN P YTH ON
Regex metacharacters REGULAR EX P RES S ION S IN P YTH ON Maria Eugenia Inzaugarat Data Scientist
REGULAR EXPRESSIONSIN PYTHON Looking for patterns Two different operations to nd a match: re.search(r"d{4}", "4506 people attend the show <_sre.SRE_Match object; span=(0, 4), match='4506 re.search(r"d+", "Yesterday, I saw 3 shows") <_sre.SRE_Match object; span=(17, 18), match='3'> re.match(r"d{4}", "4506 people attend the show" <_sre.SRE_Match object; span=(0, 4), match='4506 print(re.match(r"d+","Yesterday, I saw 3 shows" None
REGULAR EXPRESSIONSIN PYTHON Special characters Match any character (except newline): . my_links = "Just check out this link: www.amazingpics.com. It has amazing photos!" re.findall(r"www com", my_links)
REGULAR EXPRESSIONSIN PYTHON Special characters Match any character (except newline): . my_links = "Just check out this link: www.amazingpics.com. It has amazing photos!" re.findall(r"www.+com", my_links) ['www.amazingpics.com']
REGULAR EXPRESSIONSIN PYTHON Special characters Start of the string: ^ my_string = "the 80s music was much better that the 90s" re.findall(r"thesd+s", my_string) ['the 80s', 'the 90s'] re.findall(r"^thesd+s", my_string) ['the 80s']
REGULAR EXPRESSIONSIN PYTHON Special characters End of the string: $ my_string = "the 80s music hits were much better that the 90s" re.findall(r"thesd+s$", my_string) ['the 90s']
REGULAR EXPRESSIONSIN PYTHON Special characters Escape special characters: my_string = "I love the music of Mr.Go. However, the sound was too loud." print(re.split(r".s", my_string)) ['', 'lov', 'th', 'musi', 'o', 'Mr.Go', 'However', 'th', 'soun', 'wa', 'to', 'loud.'] print(re.split(r".s", my_string)) ['I love the music of Mr.Go', 'However, the sound was too loud.']
REGULAR EXPRESSIONSIN PYTHON OR operator Character: | my_string = "Elephants are the world's largest land animal! I would love to see an elephant one day" re.findall(r"Elephant|elephant", my_string) ['Elephant', 'elephant']
REGULAR EXPRESSIONSIN PYTHON OR operator Set of characters: [ ] my_string = "Yesterday I spent my afternoon with my friends: MaryJohn2 Clary3" re.findall(r"[a-zA-Z]+d", my_string) ['MaryJohn2', 'Clary3']
REGULAR EXPRESSIONSIN PYTHON OR operator Set of characters: [ ] my_string = "My&name&is#John Smith. I%live$in#London." re.sub(r"[#$%&]", " ", my_string) 'My name is John Smith. I live in London.'
REGULAR EXPRESSIONSIN PYTHON OR operand Set of characters: [ ] ^ transforms the expression to negative my_links = "Bad website: www.99.com. Favorite site: www.hola.com" re.findall(r"www[^0-9]+com", my_links) ['www.hola.com']
Let's practice! REGULAR EX P RES S ION S IN P YTH ON
Greedy vs. non- greedy matching REGULAR EX P RES S ION S IN P YTH ON Maria Eugenia Inzaugarat Data Scientist
REGULAR EXPRESSIONSIN PYTHON Greedy vs. non-greedy matching Two types of matching methods: Greedy Non-greedy or lazy Standard quanti ers are greedy by default: * , + , ? , {num, num}
REGULAR EXPRESSIONSIN PYTHON Greedy matching Greedy: match as many characters as possible Return the longest match import re re.match(r"d+", "12345bcada") <_sre.SRE_Match object; span=(0, 5), match='12345'>
REGULAR EXPRESSIONSIN PYTHON Greedy matching Backtracks when too many character matched Gives up characters one at a time import re re.match(r".*hello", "xhelloxxxxxx") <_sre.SRE_Match object; span=(0, 6), match='xhello'>
REGULAR EXPRESSIONSIN PYTHON Non-greedy matching Lazy: match as few characters as needed Returns the shortest match Append ? to greedy quanti ers import re re.match(r"d+?", "12345bcada") <_sre.SRE_Match object; span=(0, 1), match='1'>
REGULAR EXPRESSIONSIN PYTHON Non-greedy matching Backtracks when too few characters matched Expands characters one a time import re re.match(r".*?hello", "xhelloxxxxxx") <_sre.SRE_Match object; span=(0, 6), match='xhello'>
Let's practice! REGULAR EX P RES S ION S IN P YTH ON

Regular expression in python for students

  • 1.
    Introduction to regular expressions REGULAREX P RES S ION S IN P YTH ON Maria Eugenia Inzaugarat Data Scientist
  • 2.
    REGULAR EXPRESSIONSIN PYTHON Whatis a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text
  • 3.
    REGULAR EXPRESSIONSIN PYTHON Whatis a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Normal characters match themselves ( st )
  • 4.
    REGULAR EXPRESSIONSIN PYTHON Whatis a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Metacharacters represent types of characters ( d , s , w ) or ideas ( {3,10} )
  • 5.
    REGULAR EXPRESSIONSIN PYTHON Whatis a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Metacharacters represent types of characters ( d , s , w ) or ideas ( {3,10} )
  • 6.
    REGULAR EXPRESSIONSIN PYTHON Whatis a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Metacharacters represent types of characters ( d , s , w ) or ideas ( {3,10} )
  • 7.
    REGULAR EXPRESSIONSIN PYTHON Whatis a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Metacharacters represent types of characters ( d , s , w ) or ideas ( {3,10} )
  • 8.
    REGULAR EXPRESSIONSIN PYTHON Whatis a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Pattern: a sequence of characters that maps to words or punctuation
  • 9.
    REGULAR EXPRESSIONSIN PYTHON Whatis a regular expression? REGular EXpression or regex: String containing a combination of normal characters and special metacharacters that describes patterns to nd text or positions within a text Pattern matching usage: Find and replace text Validate strings Very powerful and fast
  • 10.
    REGULAR EXPRESSIONSIN PYTHON There module import re Find all matches of a pattern: re.findall(r"#movies", "Love #movies! I had fun yesterday going to the #movies") ['#movies', '#movies']
  • 11.
    REGULAR EXPRESSIONSIN PYTHON There module import re Split string at each match: re.split(r"!", "Nice Place to eat! I'll come back! Excellent meat!") ['Nice Place to eat', " I'll come back", ' Excellent meat', '']
  • 12.
    REGULAR EXPRESSIONSIN PYTHON There module import re Replace one or many matches with a string: re.sub(r"yellow", "nice", "I have a yellow car and a yellow house in a yellow neighborhood") 'I have a nice car and a nice house in a nice neighborhood'
  • 13.
    REGULAR EXPRESSIONSIN PYTHON Supportedmetacharacters re.findall(r"Userd", "The winners are: User9, UserN, User8") ['User9', 'User8'] re.findall(r"UserD", "The winners are: User9, UserN, User8") ['UserN']
  • 14.
    REGULAR EXPRESSIONSIN PYTHON Supportedmetacharacters re.findall(r"Userw", "The winners are: User9, UserN, User8") ['User9', 'UserN', 'User8'] re.findall(r"Wd", "This skirt is on sale, only $5 today!") ['$5']
  • 15.
    REGULAR EXPRESSIONSIN PYTHON Supportedmetacharacters re.findall(r"DatasScience", "I enjoy learning Data Science") ['Data Science'] re.sub(r"iceScream", "ice cream", "I really like ice-cream") 'I really like ice cream'
  • 16.
    Let's practice! REGULAR EXP RES S ION S IN P YTH ON
  • 17.
    Repetitions REGULAR EX PRES S ION S IN P YTH ON Maria Eugenia Inzaugarat Data Science
  • 18.
    REGULAR EXPRESSIONSIN PYTHON Repeatedcharacters Validate the following string:
  • 19.
    REGULAR EXPRESSIONSIN PYTHON Repeatedcharacters Validate the following string:
  • 20.
    REGULAR EXPRESSIONSIN PYTHON Repeatedcharacters Validate the following string:
  • 21.
    REGULAR EXPRESSIONSIN PYTHON Repeatedcharacters Validate the following string: import re password = "password1234" re.search(r"wwwwwwwwdddd", password) <_sre.SRE_Match object; span=(0, 12), match='password1234'>
  • 22.
    REGULAR EXPRESSIONSIN PYTHON Repeatedcharacters Validate the following string: import re password = "password1234" re.search(r"w{8}d{4}", password) <_sre.SRE_Match object; span=(0, 12), match='password1234'> Quanti ers: A metacharacter that tells the regex engine how many times to match a character immediately to its left.
  • 23.
    REGULAR EXPRESSIONSIN PYTHON Quantiers Once or more: + text = "Date of start: 4-3. Date of registration: 10-04." re.findall(r" ", text)
  • 24.
    REGULAR EXPRESSIONSIN PYTHON Quantiers Once or more: + text = "Date of start: 4-3. Date of registration: 10-04." re.findall(r"d+- ", text)
  • 25.
    REGULAR EXPRESSIONSIN PYTHON Quantiers Once or more: + text = "Date of start: 4-3. Date of registration: 10-04." re.findall(r"d+-d+", text) ['4-3', '10-04']
  • 26.
    REGULAR EXPRESSIONSIN PYTHON Quantiers Zero times or more: * my_string = "The concert was amazing! @ameli!a @joh&&n @mary90" re.findall(r"@w+W*w+", my_string) ['@ameli!a', '@joh&&n', '@mary90']
  • 27.
    REGULAR EXPRESSIONSIN PYTHON Quantiers Zero times or once: ? text = "The color of this image is amazing. However, the colour blue could be brighter." re.findall(r"colou?r", text) ['color', 'colour']
  • 28.
    REGULAR EXPRESSIONSIN PYTHON Quantiers n times at least, m times at most : {n, m} phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424" re.findall(r" ", phone_number)
  • 29.
    REGULAR EXPRESSIONSIN PYTHON Quantiers n times at least, m times at most : {n, m} phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424" re.findall(r"d{1,2}- ", phone_number)
  • 30.
    REGULAR EXPRESSIONSIN PYTHON Quantiers n times at least, m times at most : {n, m} phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424" re.findall(r"d{1,2}-d{3}- ", phone_number)
  • 31.
    REGULAR EXPRESSIONSIN PYTHON Quantiers n times at least, m times at most : {n, m} phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424" re.findall(r"d{1,2}-d{3}-d{2,3}-d{4,}", phone_number) ['1-966-847-3131', '54-908-42-42424']
  • 32.
    REGULAR EXPRESSIONSIN PYTHON Quantiers Immediately to the left r"apple+" : + applies to e and not to apple
  • 33.
    Let's practice! REGULAR EXP RES S ION S IN P YTH ON
  • 34.
    Regex metacharacters REGULAR EX PRES S ION S IN P YTH ON Maria Eugenia Inzaugarat Data Scientist
  • 35.
    REGULAR EXPRESSIONSIN PYTHON Lookingfor patterns Two different operations to nd a match: re.search(r"d{4}", "4506 people attend the show <_sre.SRE_Match object; span=(0, 4), match='4506 re.search(r"d+", "Yesterday, I saw 3 shows") <_sre.SRE_Match object; span=(17, 18), match='3'> re.match(r"d{4}", "4506 people attend the show" <_sre.SRE_Match object; span=(0, 4), match='4506 print(re.match(r"d+","Yesterday, I saw 3 shows" None
  • 36.
    REGULAR EXPRESSIONSIN PYTHON Specialcharacters Match any character (except newline): . my_links = "Just check out this link: www.amazingpics.com. It has amazing photos!" re.findall(r"www com", my_links)
  • 37.
    REGULAR EXPRESSIONSIN PYTHON Specialcharacters Match any character (except newline): . my_links = "Just check out this link: www.amazingpics.com. It has amazing photos!" re.findall(r"www.+com", my_links) ['www.amazingpics.com']
  • 38.
    REGULAR EXPRESSIONSIN PYTHON Specialcharacters Start of the string: ^ my_string = "the 80s music was much better that the 90s" re.findall(r"thesd+s", my_string) ['the 80s', 'the 90s'] re.findall(r"^thesd+s", my_string) ['the 80s']
  • 39.
    REGULAR EXPRESSIONSIN PYTHON Specialcharacters End of the string: $ my_string = "the 80s music hits were much better that the 90s" re.findall(r"thesd+s$", my_string) ['the 90s']
  • 40.
    REGULAR EXPRESSIONSIN PYTHON Specialcharacters Escape special characters: my_string = "I love the music of Mr.Go. However, the sound was too loud." print(re.split(r".s", my_string)) ['', 'lov', 'th', 'musi', 'o', 'Mr.Go', 'However', 'th', 'soun', 'wa', 'to', 'loud.'] print(re.split(r".s", my_string)) ['I love the music of Mr.Go', 'However, the sound was too loud.']
  • 41.
    REGULAR EXPRESSIONSIN PYTHON ORoperator Character: | my_string = "Elephants are the world's largest land animal! I would love to see an elephant one day" re.findall(r"Elephant|elephant", my_string) ['Elephant', 'elephant']
  • 42.
    REGULAR EXPRESSIONSIN PYTHON ORoperator Set of characters: [ ] my_string = "Yesterday I spent my afternoon with my friends: MaryJohn2 Clary3" re.findall(r"[a-zA-Z]+d", my_string) ['MaryJohn2', 'Clary3']
  • 43.
    REGULAR EXPRESSIONSIN PYTHON ORoperator Set of characters: [ ] my_string = "My&name&is#John Smith. I%live$in#London." re.sub(r"[#$%&]", " ", my_string) 'My name is John Smith. I live in London.'
  • 44.
    REGULAR EXPRESSIONSIN PYTHON ORoperand Set of characters: [ ] ^ transforms the expression to negative my_links = "Bad website: www.99.com. Favorite site: www.hola.com" re.findall(r"www[^0-9]+com", my_links) ['www.hola.com']
  • 45.
    Let's practice! REGULAR EXP RES S ION S IN P YTH ON
  • 46.
    Greedy vs. non- greedymatching REGULAR EX P RES S ION S IN P YTH ON Maria Eugenia Inzaugarat Data Scientist
  • 47.
    REGULAR EXPRESSIONSIN PYTHON Greedyvs. non-greedy matching Two types of matching methods: Greedy Non-greedy or lazy Standard quanti ers are greedy by default: * , + , ? , {num, num}
  • 48.
    REGULAR EXPRESSIONSIN PYTHON Greedymatching Greedy: match as many characters as possible Return the longest match import re re.match(r"d+", "12345bcada") <_sre.SRE_Match object; span=(0, 5), match='12345'>
  • 49.
    REGULAR EXPRESSIONSIN PYTHON Greedymatching Backtracks when too many character matched Gives up characters one at a time import re re.match(r".*hello", "xhelloxxxxxx") <_sre.SRE_Match object; span=(0, 6), match='xhello'>
  • 50.
    REGULAR EXPRESSIONSIN PYTHON Non-greedymatching Lazy: match as few characters as needed Returns the shortest match Append ? to greedy quanti ers import re re.match(r"d+?", "12345bcada") <_sre.SRE_Match object; span=(0, 1), match='1'>
  • 51.
    REGULAR EXPRESSIONSIN PYTHON Non-greedymatching Backtracks when too few characters matched Expands characters one a time import re re.match(r".*?hello", "xhelloxxxxxx") <_sre.SRE_Match object; span=(0, 6), match='xhello'>
  • 52.
    Let's practice! REGULAR EXP RES S ION S IN P YTH ON