Pandas DataFrame Analysis

Pandas DataFrame objects come with a variety of built-in functions like head(), tail() and info() that allow us to view and analyze DataFrames.

View Data in a Pandas DataFrame

A Pandas Dataframe can be displayed as any other Python variable using the print() function.

However, when dealing with very large DataFrames with large numbers of rows and columns, the print() function is unable to display the whole DataFrame. Instead, it prints only a part of the DataFrame.

In the case of large DataFrames, we can use head(), tail() and info() methods to get the overview of the DataFrame.


The head() method provides a rapid summary of a DataFrame. It returns the column headers and a specified number of rows from the beginning. For example,

 import pandas as pd # create a dataframe data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike', 'Sarah', 'David', 'Linda', 'Tom', 'Emily'], 'Age': [25, 30, 35, 28, 32, 27, 40, 33, 29, 31], 'City': ['New York', 'Paris', 'London', 'Sydney', 'Tokyo', 'Berlin', 'Rome', 'Madrid', 'Toronto', 'Moscow']} df = pd.DataFrame(data) # display the first three rows print('First Three Rows:') print(df.head(3)) print() # display the first five rows print('First Five Rows:') print(df.head())

Output

 First Three Rows: Name Age City 0 John 25 New York 1 Alice 30 Paris 2 Bob 35 London First Five Rows: Name Age City 0 John 25 New York 1 Alice 30 Paris 2 Bob 35 London 3 Emma 28 Sydney 4 Mike 32 Tokyo

In this example, we displayed selected rows of the df DataFrame starting from the top using head().

Notice that the first five rows are selected by default when no argument is passed to the head() method.


Pandas tail()

The tail() method is similar to head() but it returns data starting from the end of the DataFrame. For example,

 import pandas as pd # create a dataframe data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike', 'Sarah', 'David', 'Linda', 'Tom', 'Emily'], 'Age': [25, 30, 35, 28, 32, 27, 40, 33, 29, 31], 'City': ['New York', 'Paris', 'London', 'Sydney', 'Tokyo', 'Berlin', 'Rome', 'Madrid', 'Toronto', 'Moscow']} df = pd.DataFrame(data) # display the last three rows print('Last Three Rows:') 
print(df.tail(3))
print() # display the last five rows print('Last Five Rows:')
print(df.tail())

Output

 Last Three Rows: Name Age City 7 Linda 33 Madrid 8 Tom 29 Toronto 9 Emily 31 Moscow Last Five Rows: Name Age City 5 Sarah 27 Berlin 6 David 40 Rome 7 Linda 33 Madrid 8 Tom 29 Toronto 9 Emily 31 Moscow

In this example, we displayed selected rows of the df DataFrame starting from the bottom using tail().

Notice that the last five rows are selected by default when no argument is passed to the tail() method.


Get DataFrame Information

The info() method gives us the overall information about the DataFrame such as its class, data type, size etc. For example,

 import pandas as pd # create dataframe data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike', 'Sarah', 'David', 'Linda', 'Tom', 'Emily'], 'Age': [25, 30, 35, 28, 32, 27, 40, 33, 29, 31], 'City': ['New York', 'Paris', 'London', 'Sydney', 'Tokyo', 'Berlin', 'Rome', 'Madrid', 'Toronto', 'Moscow']} df = pd.DataFrame(data) 
# get info about dataframe df.info()

Output

 <class 'pandas.core.frame.DataFrame'> RangeIndex: 10 entries, 0 to 9 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 10 non-null object 1 Age 10 non-null int64 2 City 10 non-null object dtypes: int64(1), object(2) memory usage: 372.0+ bytes

As you can see, the info() method provides the following information about a Pandas DataFrame:

  • Class: The class of the object, which indicates that it is a pandas DataFrame
  • RangeIndex: The index range of the DataFrame, showing the starting and ending index values
  • Data columns: The total number of columns in the DataFrame
  • Column names: The names of the columns in the DataFrame
  • Non-Null Count: The count of non-null values for each column
  • Dtype: The data types of the columns
  • Memory usage: The memory usage of the DataFrame in bytes

The provided information enables us to understand about the dataset like its structure, dimension, and missing values. This insight is essential for data exploration, cleaning, manipulation, and analysis.

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO
  • Interactive Courses
  • Certificates
  • AI Help
  • 2000+ Challenges