DataFrame manipulation in Pandas involves editing and modifying existing DataFrames. Some common DataFrame manipulation operations are:
- Adding rows/columns
- Removing rows/columns
- Renaming rows/columns
Add a New Column to a Pandas DataFrame
We can add a new column to an existing Pandas DataFrame by simply declaring a new list as a column. For example,
import pandas as pd # define a dictionary containing student data data = {'Name': ['John', 'Emma', 'Michael', 'Sophia'], 'Height': [5.5, 6.0, 5.8, 5.3], 'Qualification': ['BSc', 'BBA', 'MBA', 'BSc']} # convert the dictionary into a DataFrame df = pd.DataFrame(data) # declare a new list address = ['New York', 'London', 'Sydney', 'Toronto'] # assign the list as a column df['Address'] = address print(df)
Output
Name Height Qualification Address 0 John 5.5 BSc New York 1 Emma 6.0 BBA London 2 Michael 5.8 MBA Sydney 3 Sophia 5.3 BSc Toronto
In this example, we assign the list address to the Address
column in the DataFrame.
Add a New Row to a Pandas DataFrame
Adding rows to a DataFrame is not quite as straightforward as adding columns in Pandas. We use the .loc
property to add a new row to a Pandas DataFrame.
For example,
import pandas as pd # define a dictionary containing student data data = {'Name': ['John', 'Emma', 'Michael', 'Sophia'], 'Height': [5.5, 6.0, 5.8, 5.3], 'Qualification': ['BSc', 'BBA', 'MBA', 'BSc']} # convert the dictionary into a DataFrame df = pd.DataFrame(data) print("Original DataFrame:") print(df) print() # add a new row df.loc[len(df.index)] = ['Amy', 5.2, 'BIT'] print("Modified DataFrame:") print(df)
Output
Original DataFrame: Name Height Qualification 0 John 5.5 BSc 1 Emma 6.0 BBA 2 Michael 5.8 MBA 3 Sophia 5.3 BSc Modified DataFrame: Name Height Qualification 0 John 5.5 BSc 1 Emma 6.0 BBA 2 Michael 5.8 MBA 3 Sophia 5.3 BSc 4 Amy 5.2 BIT
In this example, we added a row ['Amy', 5.2, 'BIT']
to the df DataFrame.
Here,
len(df.index)
: returns the number of rows in dfdf.loc[...]
: accesses the row with index value enclosed by the square brackets
To learn more about .loc
, please visit Pandas Indexing and Slicing.
Remove Rows/Columns from a Pandas DataFrame
We can use drop()
to delete rows and columns from a DataFrame.
Example: Delete Rows
import pandas as pd # create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Felipe', 'Rita'], 'Age': [25, 30, 35, 40, 22, 29], 'City': ['New York', 'London', 'Paris', 'Tokyo', 'Bogota', 'Banglore']} df = pd.DataFrame(data) # display the original DataFrame print("Original DataFrame:") print(df) print() # delete row with index 4 df.drop(4, axis=0, inplace=True) # delete row with index 5 df.drop(index=5, inplace=True) # delete rows with index 1 and 3 df.drop([1, 3], axis=0, inplace=True) # display the modified DataFrame after deleting rows print("Modified DataFrame:") print(df)
Output
Original DataFrame: Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris 3 David 40 Tokyo Modified DataFrame: Name Age City 0 Alice 25 New York 2 Charlie 35 Paris
In this example, we deleted single rows using the labels=4
and index=5
parameters. We also deleted multiple rows with labels=[1,3]
argument.
Here,
axis=0
: indicates that rows are to be deletedinplace=True
: indicates that the changes are to be made in the original DataFrame
Example: Delete columns
import pandas as pd # create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo'], 'Height': ['165', '178', '185', '171'], 'Profession': ['Engineer', 'Entrepreneur', 'Unemployed', 'Actor'], 'Marital Status': ['Single', 'Married', 'Divorced', 'Engaged']} df = pd.DataFrame(data) # display the original DataFrame print("Original DataFrame:") print(df) print() # delete age column df.drop('Age', axis=1, inplace=True) # delete marital status column df.drop(columns='Marital Status', inplace=True) # delete height and profession columns df.drop(['Height', 'Profession'], axis=1, inplace=True) # display the modified DataFrame after deleting rows print("Modified DataFrame:") print(df)
Output
Original DataFrame: Name Age City Height Profession Marital Status 0 Alice 25 New York 165 Engineer Single 1 Bob 30 London 178 Entrepreneur Married 2 Charlie 35 Paris 185 Unemployed Divorced 3 David 40 Tokyo 171 Actor Engaged Modified DataFrame: Name City 0 Alice New York 1 Bob London 2 Charlie Paris 3 David Tokyo
In this example, we deleted single columns using the labels='Age'
and columns='Marital Status'
parameters. We also deleted multiple columns with labels=['Height', 'Profession']
argument.
Here,
axis=1
: indicates that columns are to be deletedinplace=True
: indicates that the changes are to be made in the original DataFrame
Rename Labels in a DataFrame
We can rename columns in a Pandas DataFrame using the rename()
function.
Example: Rename Columns
import pandas as pd # create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo']} df = pd.DataFrame(data) # display the original DataFrame print("Original DataFrame:") print(df) print() # rename column 'Name' to 'First_Name' df.rename(columns= {'Name': 'First_Name'}, inplace=True) # rename columns 'Age' and 'City' df.rename(mapper= {'Age': 'Number', 'City':'Address'}, axis=1, inplace=True) # display the DataFrame after renaming column print("Modified DataFrame:") print(df)
Output
Original DataFrame: Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris 3 David 40 Tokyo Modified DataFrame: First_Name Number Address 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris 3 David 40 Tokyo
In this example, we renamed a single column using the columns={'Name': 'First_Name'}
parameter. We also renamed multiple columns with mapper={'Age': 'Number', 'City':'Address'}
argument.
Here,
axis=1
: indicates that columns are to be renamedinplace=True
: indicates that the changes are to be made in the original DataFrame
Example: Rename Row Labels
import pandas as pd # create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo']} df = pd.DataFrame(data) # display the original DataFrame print("Original DataFrame:") print(df) print() # rename column one index label df.rename(index={0: 7}, inplace=True) # rename columns multiple index labels df.rename(mapper={1: 10, 2: 100}, axis=0, inplace=True) # display the DataFrame after renaming column print("Modified DataFrame:") print(df)
Output
Original DataFrame: Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris 3 David 40 Tokyo Modified DataFrame: Name Age City 7 Alice 25 New York 10 Bob 30 London 100 Charlie 35 Paris 3 David 40 Tokyo
In this example, we renamed a single row using the index={0: 7}
parameter. We also renamed multiple rows with mapper={1: 10, 2: 100}
argument.
Here,
axis=0
: indicates that rows are to be renamedinplace=True
: indicates that the changes are to be made in the original DataFrame