Python tip 29: negative lookarounds
Lookarounds help to create custom anchors and add conditions within a regex definition. These assertions are also known as zero-width patterns because they add restrictions similar to anchors and are not part of the matched portions. The syntax for negative lookarounds is shown below:
(?!pat)
negative lookahead assertion(?<!pat)
negative lookbehind assertion
Here are some examples:
# change 'cat' only if it is not followed by a digit character # note that the end of string satisfies the given assertion # 'catcat' has two matches as the assertion doesn't consume characters >>> re.sub(r'cat(?!\d)', 'dog', 'hey cats! cat42 cat_5 catcat') 'hey dogs! cat42 dog_5 dogdog' # change 'cat' only if it is not preceded by _ # note how 'cat' at the start of string is matched as well >>> re.sub(r'(?<!_)cat', 'dog', 'cat _cat 42catcat') 'dog _cat 42dogdog' # change whole word only if it is not preceded by : or - >>> re.sub(r'(?<![:-])\b\w+', 'X', ':cart <apple: -rest ;tea') ':cart <X: -rest ;X'
Lookarounds can be placed anywhere and multiple lookarounds can be combined in any order. They do not consume characters nor do they play a role in matched portions. They just let you know whether the condition you want to test is satisfied from the current location in the input string.
# extract all whole words that do not start with a/n >>> ip = 'a_t row on Urn e note Dust n end a2-e|u' >>> re.findall(r'(?![an])\b\w+', ip) ['row', 'on', 'Urn', 'e', 'Dust', 'end', 'e', 'u'] # since the three assertions used here are all zero-width, # all of the 6 possible combinations will be equivalent >>> re.sub(r'(?!\Z)\b(?<!\A)', ' ', 'output=num1+35*42/num2') 'output = num1 + 35 * 42 / num2'
Video demo:
See also my 100 Page Python Intro and Understanding Python re(gex)? ebooks.