How to filter rows in Pandas by regex?



A regular expression (regex) is a sequence of characters that define a search pattern. To filter rows in Pandas by regex, we can use the str.match() method.

Steps

  • Create a two-dimensional, size-mutable, potentially heterogeneous tabular data, df.
  • Print the input DataFrame, df.
  • Initialize a variable regex for the expression. Supply a string value as regex, for example, the string 'J.*' will filter all the entries that start with the letter 'J'.
  • Use df.column_name.str.match(regex) to filter all the entries in the given column name by the supplied regex.

Example

 import pandas as pd df = pd.DataFrame(    dict(       name=['John', 'Jacob', 'Tom', 'Tim', 'Ally'],       marks=[89, 23, 100, 56, 90],       subjects=["Math", "Physics", "Chemistry", "Biology", "English"] ) ) print "Input DataFrame is:\n", df regex = 'J.*' print "After applying ", regex, " DataFrame is:\n", df[df.name.str.match(regex)] regex = 'A.*' print "After applying ", regex, " DataFrame is:\n", df[df.name.str.match(regex)]

Output

 Input DataFrame is:      name    marks   subjects 0    John     89        Math 1   Jacob     23     Physics 2     Tom    100   Chemistry 3     Tim     56     Biology 4    Ally     90     English After applying J.* DataFrame is:     name   marks   subjects 0   John     89        Math 1  Jacob     23     Physics After applying A.* DataFrame is:     name   marks   subjects 4   Ally   90     English
Updated on: 2021-09-14T13:51:18+05:30

18K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements