Exploding multiple columns

Mormolyce · (This post was last modified: Sep-03-2025, 05:21 AM by Mormolyce.)

I'm importing an Excel file into Python using Pandas, this is a test dataset I'm using to get the code working (the actual data I'm trying to read in is very large):

[Image: o6ajuTu.png]

Basically, my input data will have certain columns (Items, Type and Price) which contain lists of items separated by pipes. What I want to do is explode this data to get a result like this:

[Image: x4w62nB.png]

I've read the data into a dataframe (df) and this is the code I'm using to reformat and explode the data:

explodecolumns = ['Items', 'Type', 'Price'] df[explodecolumns] = df[explodecolumns].apply(lambda x: x.str.replace(r'\s*\|\s*', '|', regex=True).str.split('|')) dfexploded = df.explode(explodecolumns).reset_index(drop=True)

This works fine if I input the first two lines, where the lists are perfectly aligned, but of course it throws an error if I include the bottom two which are the more general case. My source data won't necessarily have all the data populated (eg the third line only has three entries in the "Items" column but four elements in the "Type" column) and might have no data at all in some cells.

I've been trying to figure out if there's a simple way to do this, does anyone know?

P.S. I think this code should create my test dataset if that helps:

data = {'Cart_ID': [923328, 923549, 924028, 901983], 'Date': ['2023-07-04 00:00:00', '2023-06-22 00:00:00', '2024-11-13 00:00:00', '2021-03-09 00:00:00'], 'Items': ['Apples', 'Bran | Soup', '| | Peas', 'Carrots | | Sauce'], 'Type': ['Fruit', 'Cereal | Cans', 'Pots | | | Fruit', float("nan")], 'Bag': ['Plastic', 'Paper', 'Paper', 'Plastic'], 'Weight': [2.65, 8.4, 21.31, 10.2], 'Price': ['3.8', '4.1 | 2.15', '33.4 | 1.1 | 4.2 | 0.43', '0.67 | 9.4 | 3.21']}

Pedroski55 · Sep-04-2025, 12:57 AM

Using the data you supplied, if we split date['Items'] or data['Type'] on '|' we get a list of 6 elements.

But if we split data['Price'] we get a list of 10 elements. This will cause problems: 6 items with 10 prices!

And why you have float("nan") in there God only knows!

import pandas as pd data = {'Cart_ID': [923328, 923549, 924028, 901983], 'Date': ['2023-07-04 00:00:00', '2023-06-22 00:00:00', '2024-11-13 00:00:00', '2021-03-09 00:00:00'], 'Items': ['Apples', 'Bran | Soup', '| | Peas', 'Carrots | | Sauce'], 'Type': ['Fruit', 'Cereal | Cans', 'Pots | | | Fruit', float("nan")], 'Bag': ['Plastic', 'Paper', 'Paper', 'Plastic'], 'Weight': [2.65, 8.4, 21.31, 10.2], 'Price': ['3.8', '4.1 | 2.15', '33.4 | 1.1 | 4.2 | 0.43', '0.67 | 9.4 | 3.21']} # all the lists for data[key] are 4 elements long for key in data.keys(): print(len(data[key])) keys = ['Items', 'Type', 'Price'] # if we split e.g. 'Bran | Soup' we need to associate the correct which is messy # data['Cart_ID'], data['Date'], data['Bag'], data['Weight'] for each item in the sublist a new_data = {} keys = ['Items', 'Type', 'Price'] for key in keys: newlist = [] for i in range(4): print(f'key = {key}, i = {i} ... ') # float("nan") could be a spanner in these works! if type(data[key][i]) == float: newlist.append(data[key][i]) #data[key] = newlist elif not '|' in data[key][i]: newlist.append(data[key][i]) elif '|' in data[key][i]: a = data[key][i].split() for b in a: if not b == '|': print(f'b = {b}') newlist.append(b) new_data['new' + key] = newlist

Try this:

# this will not work because all the lists must be the same length partdf = pd.DataFrame.from_dict(new_data, orient='columns') # this will work, but there are 3 prices associated with None and Sauce has NaN as Type partdf = pd.DataFrame.from_dict(new_data, orient='index')

I get this:

Output:
 0 1 2 3 4 5 6 7 8 9 newItems Apples Bran Soup Peas Carrots Sauce None None None None newType Fruit Cereal Cans Pots Fruit NaN None None None None newPrice 3.8 4.1 2.15 33.4 1.1 4.2 0.43 0.67 9.4 3.21

I think your data is faulty!

I amended the data then it works:

import pandas as pd # data2['Price'] has 10 elements so all lists must contain 10 items, even if the list is technically shorter data2 = {'Cart_ID': [923328, 923549, 924028, 901983, 901984, 901985, 901986, 901987, 901988, 901989], 'Date': ['2023-07-04 00:00:00', '2023-06-22 00:00:00', '2024-11-13 00:00:00', '2021-03-09 00:00:00', '2023-07-04 00:00:00', '2023-06-22 00:00:00', '2024-11-13 00:00:00', '2021-03-09 00:00:00', '2023-06-22 00:00:00', '2024-11-13 00:00:00'], 'Items': ['Apples', 'Bran | Soup', '| | Peas', 'Carrots | | Tomato-Ketchup', '包子 | 饺子 | 卷饼 | 油条'], 'Type': ['Fruit', 'Cereal | Cans', 'Pots | | | Fruit', 'Sauces', '早饭 | 午饭 | 晚饭 | 随便'], 'Bag': ['Plastic', 'Paper', 'Paper', 'Plastic', 'Plastic', 'Paper', 'Paper', 'Plastic', 'Silicon', 'Hemp'], 'Weight': [2.65, 8.4, 21.31, 10.2, 2.65, 8.4, 21.31, 10.2, 12, 13], 'Price': ['3.8', '4.1 | 2.15', '33.4 | 1.1 | 4.2 | 0.43', '0.67 | 9.4', '| 3.21']} keys = ['Items', 'Type', 'Price'] # the lists for data2[key] are not all the same length # but after splitting elements we will have 10 elements in each list for key in data.keys(): print(key, len(data[key])) keys = ['Items', 'Type', 'Price'] new_data = {} for key in keys: newlist = [] for i in range(5): print(f'key = {key}, i = {i} ... ') # float("nan") could be a spanner in these works! if type(data2[key][i]) == float: newlist.append(data2[key][i]) #data[key] = newlist elif not '|' in data2[key][i]: newlist.append(data2[key][i]) elif '|' in data2[key][i]: a = data2[key][i].split() for b in a: if not b == '|': print(f'b = {b}') newlist.append(b) new_data['new' + key] = newlist # now the lists are all 10 elements long for key in new_data.keys(): print(key, len(new_data[key]))

Now these both work:

# this will works because all the lists are the same length partdf = pd.DataFrame.from_dict(new_data, orient='columns')

Gives:

Output:
 newItems newType newPrice 0 Apples Fruit 3.8 1 Bran Cereal 4.1 2 Soup Cans 2.15 3 Peas Pots 33.4 4 Carrots Fruit 1.1 5 Tomato-Ketchup Sauces 4.2 6 包子 早饭 0.43 7 饺子 午饭 0.67 8 卷饼 晚饭 9.4 9 油条 随便 3.21

And this:

# now each Price is associated with an item partdf = pd.DataFrame.from_dict(new_data, orient='index')

Gives:

Output:
 0 1 2 3 ... 6 7 8 9 newItems Apples Bran Soup Peas ... 包子 饺子 卷饼 油条 newType Fruit Cereal Cans Pots ... 早饭 午饭 晚饭 随便 newPrice 3.8 4.1 2.15 33.4 ... 0.43 0.67 9.4 3.21

And all prices are associated with an item. Customers may dislike paying something for nothing!

You could also post your Excel files for reference.

Mormolyce · Sep-07-2025, 02:13 PM

If it helps anyone I eventually designed code that works in the general case:

df = dfimport['Sheet6'].copy().fillna('') explodecolumns = ['Items', 'Type', 'Price'] df[explodecolumns] = df[explodecolumns].apply(lambda x: x.str.replace(r'\s*\|\s*', '|', regex=True).str.split('|')) # Padding lists in rows for the explode function df['Length'] = df[explodecolumns].map(len).max(axis=1) def pad_list(lst, pad_amount, filler): if len(lst) < pad_amount: return lst + [filler] * (pad_amount - len(lst)) return lst for col in explodecolumns: df[col] = df.apply(lambda row: pad_list(row[col], row['Length'], ''), axis=1) df.drop(['Length'], axis=1, inplace=True) dfexploded = df.copy().explode(explodecolumns) dfexploded['ItemNum'] = dfexploded.groupby(dfexploded.index).cumcount() + 1 dfexploded.reset_index(drop=True, inplace=True)

Might not be the most efficient way but it works.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to check multiple columns value within range	SamLiu	2	3,016	Mar-13-2023, 09:32 AM Last Post: SamLiu
	How to move multiple columns to initial position	SriRajesh	4	3,822	Jul-02-2022, 10:34 AM Last Post: deanhystad
	df column aggregate and group by multiple columns	SriRajesh	0	2,120	May-06-2022, 02:26 PM Last Post: SriRajesh
	Split single column to multiple columns	SriRajesh	1	2,422	Jan-07-2022, 06:43 PM Last Post: jefsummers
	Apply fillna to multiple columns in dataframe	rraillon	2	5,133	Aug-05-2021, 01:11 PM Last Post: rraillon
	Pandas: how to split one row of data to multiple rows and columns in Python	GerardMoussendo	4	11,715	Feb-22-2021, 06:51 PM Last Post: eddywinch82
	How to fill parameter with data from multiple columns CSV file	greenpine	1	3,288	Dec-21-2020, 06:50 PM Last Post: Larz60+
	convert string into multiple columns in python3	VAN	2	3,981	Sep-26-2020, 11:14 PM Last Post: scidam
	How to melt dataframe multiple columns to one column	Mekala	1	4,618	Sep-24-2020, 08:32 PM Last Post: scidam
	Dividing certain columns in multiple CSV files	NatesM	4	7,242	Dec-20-2019, 07:59 PM Last Post: ichabod801

Exploding multiple columns

User Panel Messages

Announcements