PROCESSING TEXT WITH REGEX
WHY IS REGEX NECESSARY?  Question: What does the following script do? def isPhoneNUmber(text): if len(text) != 12: return False for i in range(0, 3): if not text[i].isdecimal(): return False if text[3] != ‘-’: return False for i in range(4, 7): if not text[i].isdecimal(): return False if text[7] != ‘-’: return False for i in range(8, 12): if not text[i].isdecimal(): return False return True
WHY IS REGEX NECESSARY?  Question: What about this one? message = raw_input(“Enter a string”) for I in range(len(message)): chunk = message[i: i+12] if (isPhoneNumber(chunk): print “Phone number found: “ + chunk Print “Done”
ARE THEY THAT IMPORTANT?  Regular Expressions as we have previously discussed are dynamic descriptive patterns designated for searching (pattern recognition).  Ex.  Without regular expressions you are hard coding fixed values to search for  = vs. like
INCORPORATING REGEX IN PYTHON  Python once again makes life simpler by having a prebuilt module to simplify incorporating the code into your scripts.  Enter the re module  Code: import re  There are 2 benefits to using the re module: 1) Predefined Functions: compile(), search(), findall() 2) The RegEx syntax is almost identical to Perl
PYTHON’S REGEX CHEAT SHEET
COMPILING A REGEX EXPRESSION  Each iteration that a RegEx expression is used in python must be reread and interepretted.  Thus if you were to search through an entire document each line would have to reinterpret the expression.  This can cause increased execution times and inefficiency.  The ‘re’ module has a function that will compile the expression for easy reusability.  Code: varName = re.compile(REGEX EXPRESSION)  Ex. phoneNumRegEx = re.compile(“ddd-ddd-dddd”)
THE SEARCH FUNCTION  The search() function will search a document for the first occurrence of the pattern.  It will return a True or False value depending on if there was a match to the pattern.  Code: compExpVar.search(TEXT)  Ex. phNumRegEx = re.compile(“ddd-ddd-dddd”) mo = phNumRegEx.search(“Here is 444-343-3243”) print mo print mo.group()
LET’S FIND EVERYTHING  In addition to the search() function, the ‘re’ module also has a findall() function.  findall() will return all of the strings that match the RegEx expression.  Code: compExpVar.findall(TEXT)  Ex. phNumRegEx = re.compile(“ddd”) mo = phNumRegEx.findall(“Here is 444-343-3243”) print mo

Processing Regex Python

  • 1.
  • 2.
    WHY IS REGEXNECESSARY?  Question: What does the following script do? def isPhoneNUmber(text): if len(text) != 12: return False for i in range(0, 3): if not text[i].isdecimal(): return False if text[3] != ‘-’: return False for i in range(4, 7): if not text[i].isdecimal(): return False if text[7] != ‘-’: return False for i in range(8, 12): if not text[i].isdecimal(): return False return True
  • 3.
    WHY IS REGEXNECESSARY?  Question: What about this one? message = raw_input(“Enter a string”) for I in range(len(message)): chunk = message[i: i+12] if (isPhoneNumber(chunk): print “Phone number found: “ + chunk Print “Done”
  • 4.
    ARE THEY THATIMPORTANT?  Regular Expressions as we have previously discussed are dynamic descriptive patterns designated for searching (pattern recognition).  Ex.  Without regular expressions you are hard coding fixed values to search for  = vs. like
  • 5.
    INCORPORATING REGEX IN PYTHON Python once again makes life simpler by having a prebuilt module to simplify incorporating the code into your scripts.  Enter the re module  Code: import re  There are 2 benefits to using the re module: 1) Predefined Functions: compile(), search(), findall() 2) The RegEx syntax is almost identical to Perl
  • 6.
  • 7.
    COMPILING A REGEXEXPRESSION  Each iteration that a RegEx expression is used in python must be reread and interepretted.  Thus if you were to search through an entire document each line would have to reinterpret the expression.  This can cause increased execution times and inefficiency.  The ‘re’ module has a function that will compile the expression for easy reusability.  Code: varName = re.compile(REGEX EXPRESSION)  Ex. phoneNumRegEx = re.compile(“ddd-ddd-dddd”)
  • 8.
    THE SEARCH FUNCTION The search() function will search a document for the first occurrence of the pattern.  It will return a True or False value depending on if there was a match to the pattern.  Code: compExpVar.search(TEXT)  Ex. phNumRegEx = re.compile(“ddd-ddd-dddd”) mo = phNumRegEx.search(“Here is 444-343-3243”) print mo print mo.group()
  • 9.
    LET’S FIND EVERYTHING In addition to the search() function, the ‘re’ module also has a findall() function.  findall() will return all of the strings that match the RegEx expression.  Code: compExpVar.findall(TEXT)  Ex. phNumRegEx = re.compile(“ddd”) mo = phNumRegEx.findall(“Here is 444-343-3243”) print mo