How to add header row to a Pandas Dataframe?



Pandas is a super popular data handling and manipulation library in Python which is frequently used in data analysis and data pre-processing. The Pandas library features a powerful data structure called the Pandas dataframe, which is used to store any kind of two-dimensional data. In this article we will learn about various ways to add a header row (or simply column names) to a Pandas dataframe.

NOTE ? The code in this article was tested on a jupyter notebook.

We will see how to add header rows in 5 different ways ?

  • Adding header rows when creating a dataframe with a dictionary

  • Adding header rows when creating a dataframe with a list of lists

  • Adding header rows after creating the dataframe

  • Adding header rows when reading files from a CSV

  • Adding header rows using set_axis method

Let's begin by importing Pandas

import pandas as pd 

Method 1: When creating a dataframe with a dictionary

Example

# Add header row while creating the dataframe through a dictionary data = {'course': ['Math', 'English', 'History', 'Science', 'Physics'], 'instructor': ['John Smith', 'Sarah Johnson', 'Mike Brown', 'Karen Lee', 'David Kim'], 'batch_size': [43, 25, 19, 51, 48] } df1 = pd.DataFrame(data) df1 

Output

 course instructor batch_size 0 Math John Smith 43 1 English Sarah Johnson 25 2 History Mike Brown 19 3 Science Karen Lee 51 4 Physics David Kim 48 

In the code above we initialize dummy data for our dataframe through a dictionary. The key-value pair represents the column name and the column data respectively. Pandas automatically reads this dictionary and is able to generate the columns along with the header rows.

Method 2: When creating a dataframe with list of lists

Example

# Add header row while creating the dataframe through lists data = [['apple', 'red', 5], ['banana', 'yellow', 12]] columns = ['fruit', 'color', 'quantity'] df2 = pd.DataFrame(data, columns=columns) df2 

Output

 fruit color quantity 0 apple red 5 1 banana yellow 12 

In this method, we have a list of lists where each sub-list stores the information for the rows of the dataframe. We make a list of column names and pass it to the pd.DataFrame method while initializing the dataframe.

Method 3: After creating the dataframe

Example

# Add header row after creating the dataframe data = [['apple', 'red', 5], ['banana', 'yellow', 12]] columns = ['fruit', 'color', 'quantity'] df3 = pd.DataFrame(data) df3.columns = columns df3 

Output

fruit color quantity 0 apple red 5 1 banana yellow 12 

In the code above we first initialize a dataframe without any header rows. Then we initialize a list of column names we want to use and use the pd.DataFrame.columns attribute to set the header rows of the already defined Pandas dataframe.

Method 4: When reading files from a CSV file

Example

When trying to read a CSV file using Pandas, it automatically considers the first row as the column names. However it is likely there is no column name present in our dataset as shown in the example below. Let's assume the dataset is stored as ?course_data.csv'.

# Incorrect header row df4 = pd.read_csv('course_data.csv') df4 

Output

 Math John Smith 43 0 English Sarah Johnson 25 1 History Mike Brown 19 2 Science Karen Lee 51 3 Physics David Kim 48 

The output shows that Pandas is interpreting a data sample as the header row. To tackle this, we will specify the column names by passing a list of header row names through the ?names' argument.

Example

# Add header row while reading files from CSV columns = ['course', 'instructor', 'batch_size'] df4 = pd.read_csv('course_data.csv', names=columns) df4 

Output

 course instructor batch_size 0 Math John Smith 43 1 English Sarah Johnson 25 2 History Mike Brown 19 3 Science Karen Lee 51 4 Physics David Kim 48 

As shown in the output above, Pandas is no longer reading the first data sample as a header row!

Method 5: Using set_axis method

Example

We already saw how to add header rows to an existing dataframe in Method 2. Now we will achieve the same using the pd.DataFrame.set_axis method.

# Add row row after creating the dataframe using set_axis data = [['dog', 'brown', 4], ['cat', 'white', 4], ['chicken', 'white', 2]] df5 = pd.DataFrame(data) columns = ['animal', 'color', 'num_legs'] df5.set_axis(columns, axis=1, inplace=True) df5 

Output

 animal color num_legs 0 dog brown 4 1 cat white 4 2 chicken white 2 

Here first we initialize a dataframe without any header rows using the data above. Then we use the set_axis method to add the header rows. We pass axis=1 to specify that we are setting the column names. We also set the flag, ?inplace' to be True to do in-place.

NOTE ? Setting axis = 0 would set row-names instead of column-names and may also throw errors since there are usually more rows than columns.

Conclusion

This article taught us how to add headers to dataframes in Pandas. We saw 5 different ways to do so which can be used in various different applications and projects.

Updated on: 2023-03-23T15:13:30+05:30

7K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements