Split a String into columns using regex in pandas DataFrame

Split a String into columns using regex in pandas DataFrame

To split a string into columns using regex in a pandas DataFrame, you can use the str.extract method, which allows you to extract groups from strings based on a regular expression pattern.

Here's a step-by-step guide:

  • First, you'll need to have pandas installed:
pip install pandas 
  • Now, let's see how to split a string into columns using regex:
import pandas as pd # Sample DataFrame data = { 'text': [ 'John_25_NewYork', 'Alice_30_LosAngeles', 'Bob_22_SanFrancisco' ] } df = pd.DataFrame(data) # Split the 'text' column into three new columns using regex pattern = r'(?P<Name>\w+)_(?P<Age>\d+)_(?P<City>\w+)' df_extracted = df['text'].str.extract(pattern) print(df_extracted) 

In the example above:

  • We have a sample DataFrame df with a column named 'text' containing strings.
  • We want to split these strings into three new columns: 'Name', 'Age', and 'City'.
  • The regular expression pattern (?P<Name>\w+)_(?P<Age>\d+)_(?P<City>\w+) is used to match and extract the desired components of the string. The ?P<column_name> syntax in the pattern is used to name the extracted groups.
  • We then use str.extract to apply the regex pattern and create the new columns.

The output will be:

 Name Age City 0 John 25 NewYork 1 Alice 30 LosAngeles 2 Bob 22 SanFrancisco 

You can adjust the regular expression pattern to match the specific structure of your strings and extract the desired components.


More Tags

timeout pipe android-linearlayout extended-choice-parameter named-pipes yup uisearchbardelegate kendo-ui-angular2 protoc openapi-generator

More Programming Guides

Other Guides

More Programming Examples