I'm importing an Excel file into Python using Pandas, this is a test dataset I'm using to get the code working (the actual data I'm trying to read in is very large):
![[Image: o6ajuTu.png]](https://i.imgur.com/o6ajuTu.png)
Basically, my input data will have certain columns (Items, Type and Price) which contain lists of items separated by pipes. What I want to do is explode this data to get a result like this:
![[Image: x4w62nB.png]](https://i.imgur.com/x4w62nB.png)
I've read the data into a dataframe (df) and this is the code I'm using to reformat and explode the data:
I've been trying to figure out if there's a simple way to do this, does anyone know?
P.S. I think this code should create my test dataset if that helps:
![[Image: o6ajuTu.png]](https://i.imgur.com/o6ajuTu.png)
Basically, my input data will have certain columns (Items, Type and Price) which contain lists of items separated by pipes. What I want to do is explode this data to get a result like this:
![[Image: x4w62nB.png]](https://i.imgur.com/x4w62nB.png)
I've read the data into a dataframe (df) and this is the code I'm using to reformat and explode the data:
explodecolumns = ['Items', 'Type', 'Price'] df[explodecolumns] = df[explodecolumns].apply(lambda x: x.str.replace(r'\s*\|\s*', '|', regex=True).str.split('|')) dfexploded = df.explode(explodecolumns).reset_index(drop=True)This works fine if I input the first two lines, where the lists are perfectly aligned, but of course it throws an error if I include the bottom two which are the more general case. My source data won't necessarily have all the data populated (eg the third line only has three entries in the "Items" column but four elements in the "Type" column) and might have no data at all in some cells.I've been trying to figure out if there's a simple way to do this, does anyone know?
P.S. I think this code should create my test dataset if that helps:
data = {'Cart_ID': [923328, 923549, 924028, 901983], 'Date': ['2023-07-04 00:00:00', '2023-06-22 00:00:00', '2024-11-13 00:00:00', '2021-03-09 00:00:00'], 'Items': ['Apples', 'Bran | Soup', '| | Peas', 'Carrots | | Sauce'], 'Type': ['Fruit', 'Cereal | Cans', 'Pots | | | Fruit', float("nan")], 'Bag': ['Plastic', 'Paper', 'Paper', 'Plastic'], 'Weight': [2.65, 8.4, 21.31, 10.2], 'Price': ['3.8', '4.1 | 2.15', '33.4 | 1.1 | 4.2 | 0.43', '0.67 | 9.4 | 3.21']} 