DEV Community

wrighter
wrighter

Posted on • Originally published at wrighters.io on

Basic Pandas: moving columns

Sometimes we want to manipulate a DataFrame’s columns by changing the column ordering. There are a few ways to do this, depending on what state your DataFrame is in.

>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame(np.random.rand(5,5), columns=['a', 'b', 'c', 'd', 'e']) >>> df['max'] = df.max(axis=1) >>> >>> df a b c d e max 0 0.067423 0.058920 0.999309 0.440547 0.572163 0.999309 1 0.384196 0.732857 0.138881 0.764242 0.096347 0.764242 2 0.900311 0.662776 0.223959 0.903363 0.349328 0.903363 3 0.988267 0.852733 0.913800 0.106388 0.864908 0.988267 4 0.830644 0.647775 0.596375 0.631442 0.907743 0.907743 

First, let’s just review the basics. Without moving or dropping columns, we can view any column we want in any order by just selecting them.

>>> df['max'] 0 0.999309 1 0.764242 2 0.903363 3 0.988267 4 0.907743 Name: max, dtype: float64 

Or any set of columns, including viewing the column more than once, and in any order.

>>> df[['d', 'a', 'max', 'b', 'd']] d. a max b d 0 0.440547 0.067423 0.999309 0.058920 0.440547 1 0.764242 0.384196 0.764242 0.732857 0.764242 2 0.903363 0.900311 0.903363 0.662776 0.903363 3 0.106388 0.988267 0.988267 0.852733 0.106388 4 0.631442 0.830644 0.907743 0.647775 0.631442 

So assigning back to our variable will make this reordering permanent.

df = df[['d', 'a', 'b', 'max', 'e']] 

Since the columns are just an Index, they can be converted to a list and manipulated, then you can also use the reindex method to change the columns ordering. Note that you don’t want to just assign the sorted names to columns, this won’t move them, but will rename them!

>>> df.reindex(columns=sorted(df.columns)) a b d e max 0 0.067423 0.058920 0.440547 0.572163 0.999309 1 0.384196 0.732857 0.764242 0.096347 0.764242 2 0.900311 0.662776 0.903363 0.349328 0.903363 3 0.988267 0.852733 0.106388 0.864908 0.988267 4 0.830644 0.647775 0.631442 0.907743 0.907743 

Also, when you are first creating a column, you can just insert it in the order that you want it to appear. By default, adding a column using the [] operator will put it at the end.

>>> df.insert(3, "min", df.min(axis=1)) >>> df d a b min max e 0 0.440547 0.067423 0.058920 0.058920 0.999309 0.572163 1 0.764242 0.384196 0.732857 0.096347 0.764242 0.096347 2 0.903363 0.900311 0.662776 0.349328 0.903363 0.349328 3 0.106388 0.988267 0.852733 0.106388 0.988267 0.864908 4 0.631442 0.830644 0.647775 0.631442 0.907743 0.907743 

Finally, you can pop the column, then re-insert it. Popping a column removes it and returns it, as you’d expect.

>>> col_e = df.pop("e") >>> df.insert(3, "e", col_e) >>> df d a b e min max 0 0.440547 0.067423 0.058920 0.572163 0.058920 0.999309 1 0.764242 0.384196 0.732857 0.096347 0.096347 0.764242 2 0.903363 0.900311 0.662776 0.349328 0.349328 0.903363 3 0.106388 0.988267 0.852733 0.864908 0.106388 0.988267 4 0.631442 0.830644 0.647775 0.907743 0.631442 0.907743 

So as you can see, there are a number of ways to manipulate your column ordering in your DataFrame.

Top comments (0)