Pandas DataFrame Manipulation

DataFrame manipulation in Pandas involves editing and modifying existing DataFrames. Some common DataFrame manipulation operations are:

  • Adding rows/columns
  • Removing rows/columns
  • Renaming rows/columns

Add a New Column to a Pandas DataFrame

We can add a new column to an existing Pandas DataFrame by simply declaring a new list as a column. For example,

 import pandas as pd # define a dictionary containing student data data = {'Name': ['John', 'Emma', 'Michael', 'Sophia'], 'Height': [5.5, 6.0, 5.8, 5.3], 'Qualification': ['BSc', 'BBA', 'MBA', 'BSc']} # convert the dictionary into a DataFrame df = pd.DataFrame(data) 
# declare a new list address = ['New York', 'London', 'Sydney', 'Toronto'] # assign the list as a column df['Address'] = address
print(df)

Output

  Name Height Qualification Address 0 John 5.5 BSc New York 1 Emma 6.0 BBA London 2 Michael 5.8 MBA Sydney 3 Sophia 5.3 BSc Toronto

In this example, we assign the list address to the Address column in the DataFrame.


Add a New Row to a Pandas DataFrame

Adding rows to a DataFrame is not quite as straightforward as adding columns in Pandas. We use the .loc property to add a new row to a Pandas DataFrame.

For example,

 import pandas as pd # define a dictionary containing student data data = {'Name': ['John', 'Emma', 'Michael', 'Sophia'], 'Height': [5.5, 6.0, 5.8, 5.3], 'Qualification': ['BSc', 'BBA', 'MBA', 'BSc']} # convert the dictionary into a DataFrame df = pd.DataFrame(data) print("Original DataFrame:") print(df) print() 
# add a new row df.loc[len(df.index)] = ['Amy', 5.2, 'BIT']
print("Modified DataFrame:") print(df)

Output

 Original DataFrame: Name Height Qualification 0 John 5.5 BSc 1 Emma 6.0 BBA 2 Michael 5.8 MBA 3 Sophia 5.3 BSc Modified DataFrame: Name Height Qualification 0 John 5.5 BSc 1 Emma 6.0 BBA 2 Michael 5.8 MBA 3 Sophia 5.3 BSc 4 Amy 5.2 BIT

In this example, we added a row ['Amy', 5.2, 'BIT'] to the df DataFrame.

Here,

  • len(df.index): returns the number of rows in df
  • df.loc[...]: accesses the row with index value enclosed by the square brackets

To learn more about .loc, please visit Pandas Indexing and Slicing.


Remove Rows/Columns from a Pandas DataFrame

We can use drop() to delete rows and columns from a DataFrame.

Example: Delete Rows

 import pandas as pd # create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Felipe', 'Rita'], 'Age': [25, 30, 35, 40, 22, 29], 'City': ['New York', 'London', 'Paris', 'Tokyo', 'Bogota', 'Banglore']} df = pd.DataFrame(data) # display the original DataFrame print("Original DataFrame:") print(df) print() 
# delete row with index 4 df.drop(4, axis=0, inplace=True)
# delete row with index 5 df.drop(index=5, inplace=True)
# delete rows with index 1 and 3 df.drop([1, 3], axis=0, inplace=True)
# display the modified DataFrame after deleting rows print("Modified DataFrame:") print(df)

Output

 Original DataFrame: Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris 3 David 40 Tokyo Modified DataFrame: Name Age City 0 Alice 25 New York 2 Charlie 35 Paris

In this example, we deleted single rows using the labels=4 and index=5 parameters. We also deleted multiple rows with labels=[1,3] argument.

Here,

  • axis=0: indicates that rows are to be deleted
  • inplace=True: indicates that the changes are to be made in the original DataFrame

Example: Delete columns

 import pandas as pd # create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo'], 'Height': ['165', '178', '185', '171'], 'Profession': ['Engineer', 'Entrepreneur', 'Unemployed', 'Actor'], 'Marital Status': ['Single', 'Married', 'Divorced', 'Engaged']} df = pd.DataFrame(data) # display the original DataFrame print("Original DataFrame:") print(df) print() 
# delete age column df.drop('Age', axis=1, inplace=True)
# delete marital status column df.drop(columns='Marital Status', inplace=True)
# delete height and profession columns df.drop(['Height', 'Profession'], axis=1, inplace=True)
# display the modified DataFrame after deleting rows print("Modified DataFrame:") print(df)

Output

 Original DataFrame: Name Age City Height Profession Marital Status 0 Alice 25 New York 165 Engineer Single 1 Bob 30 London 178 Entrepreneur Married 2 Charlie 35 Paris 185 Unemployed Divorced 3 David 40 Tokyo 171 Actor Engaged Modified DataFrame: Name City 0 Alice New York 1 Bob London 2 Charlie Paris 3 David Tokyo

In this example, we deleted single columns using the labels='Age' and columns='Marital Status' parameters. We also deleted multiple columns with labels=['Height', 'Profession'] argument.

Here,

  • axis=1: indicates that columns are to be deleted
  • inplace=True: indicates that the changes are to be made in the original DataFrame

Rename Labels in a DataFrame

We can rename columns in a Pandas DataFrame using the rename() function.

Example: Rename Columns

 import pandas as pd # create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo']} df = pd.DataFrame(data) # display the original DataFrame print("Original DataFrame:") print(df) print() 
# rename column 'Name' to 'First_Name' df.rename(columns= {'Name': 'First_Name'}, inplace=True)
# rename columns 'Age' and 'City' df.rename(mapper= {'Age': 'Number', 'City':'Address'}, axis=1, inplace=True)
# display the DataFrame after renaming column print("Modified DataFrame:") print(df)

Output

 Original DataFrame: Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris 3 David 40 Tokyo Modified DataFrame: First_Name Number Address 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris 3 David 40 Tokyo

In this example, we renamed a single column using the columns={'Name': 'First_Name'} parameter. We also renamed multiple columns with mapper={'Age': 'Number', 'City':'Address'} argument.

Here,

  • axis=1: indicates that columns are to be renamed
  • inplace=True: indicates that the changes are to be made in the original DataFrame

Example: Rename Row Labels

 import pandas as pd # create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo']} df = pd.DataFrame(data) # display the original DataFrame print("Original DataFrame:") print(df) print() 
# rename column one index label df.rename(index={0: 7}, inplace=True)
# rename columns multiple index labels df.rename(mapper={1: 10, 2: 100}, axis=0, inplace=True)
# display the DataFrame after renaming column print("Modified DataFrame:") print(df)

Output

 Original DataFrame: Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris 3 David 40 Tokyo Modified DataFrame: Name Age City 7 Alice 25 New York 10 Bob 30 London 100 Charlie 35 Paris 3 David 40 Tokyo

In this example, we renamed a single row using the index={0: 7} parameter. We also renamed multiple rows with mapper={1: 10, 2: 100} argument.

Here,

  • axis=0: indicates that rows are to be renamed
  • inplace=True: indicates that the changes are to be made in the original DataFrame

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO
  • Interactive Courses
  • Certificates
  • AI Help
  • 2000+ Challenges