Pandas MultiIndex

A MultiIndex in Pandas is a hierarchical indexing structure that allows us to represent and work with higher-dimensional data efficiently.

While a typical index refers to a single column, a MultiIndex contains multiple levels of indexes. Each column in a MultiIndex is linked to one another through a parent/relationship.

Let's take an example of a DataFrame containing the population of different countries.

 import pandas as pd # create a dictionary data = { "Continent": ["North America", "Europe", "Asia", "North America", "Asia", "Europe", "North America", "Asia", "Europe", "Asia"], "Country": ["United States", "Germany", "China", "Canada", "Japan", "France", "Mexico", "India", "United Kingdom", "Nepal"], "Population": [331002651, 83783942, 1439323776, 37742154, 126476461, 65273511, 128932753, 1380004385, 67886011, 29136808] } # create dataframe from dictionary df = pd.DataFrame(data) print(df)

Output

  Continent Country Population 0 North America United States 331002651 1 Europe Germany 83783942 2 Asia China 1439323776 3 North America Canada 37742154 4 Asia Japan 126476461 5 Europe France 65273511 6 North America Mexico 128932753 7 Asia India 1380004385 8 Europe United Kingdom 67886011 9 Asia Nepal 29136808

Notice the redundancy in the Continent column. North America and Europe are repeated three times each while Asia is repeated four times.

Additionally, we have arranged the entries in a random order and used integer values as index for the rows, thus complicating the task of locating data for a particular country. This task becomes tedious as the size of the data set grows.

In situations like this, hierarchical indexing, as shown in figure below, makes much more sense.

Hierarchical index with parent/child relationship between the Continent and Country columns
Hierarchical Index

Here, Continent is the parent column and Country is the child column.


Create MultiIndex in Pandas

In Pandas, we achieve hierarchical indexing using the concept of MultiIndex.

Let's see an example.

 import pandas as pd # create a dictionary data = { "Continent": ["North America", "Europe", "Asia", "North America", "Asia", "Europe", "North America", "Asia", "Europe", "Asia"], "Country": ["United States", "Germany", "China", "Canada", "Japan", "France", "Mexico", "India", "United Kingdom", "Nepal"], "Population": [331002651, 83783942, 1439323776, 37742154, 126476461, 65273511, 128932753, 1380004385, 67886011, 29136808] } # create dataframe from dictionary df = pd.DataFrame(data) # sort the data by continent df.sort_values('Continent', inplace=True) 
# create a multiindex df.set_index(['Continent','Country'], inplace=True)
print(df)

Output

  Population Continent Country Asia China 1439323776 Japan 126476461 India 1380004385 Nepal 29136808 Europe Germany 83783942 France 65273511 United Kingdom 67886011 North America United States 331002651 Canada 37742154 Mexico 128932753

In the above example, we first sorted the values in the dataframe df based on the Continent column. This groups the entries of the same continent together.

We then created a MultiIndex by passing a list of columns as an argument to the set_index() function.

Notice the order of the columns in the list. Continent comes first as it is the parent column and Country comes second as it is the child of Continent.


Access Rows With MultiIndex

We can access rows with MultiIndex as shown in the example below.

 import pandas as pd # create a dictionary data = { "Continent": ["North America", "Europe", "Asia", "North America", "Asia", "Europe", "North America", "Asia", "Europe", "Asia"], "Country": ["United States", "Germany", "China", "Canada", "Japan", "France", "Mexico", "India", "United Kingdom", "Nepal"], "Population": [331002651, 83783942, 1439323776, 37742154, 126476461, 65273511, 128932753, 1380004385, 67886011, 29136808] } # create dataframe from dictionary df = pd.DataFrame(data) # sort the data by continent df.sort_values('Continent', inplace=True) # create a multiindex df.set_index(['Continent','Country'], inplace=True) 
# access all entries under Asia asia = df.loc['Asia']
# access Canada canada = df.loc[('North America', 'Canada')]
print('Asia\n', asia) print('\nCanada\n', canada)

Output

 Asia Population Country China 1439323776 Japan 126476461 India 1380004385 Nepal 29136808 Canada Population 37742154 Name: (North America, Canada), dtype: int64

In the above example, we accessed all the entries under Asia by passing a single string Asia to df.loc[].

To access a particular row Canada, we passed a tuple ('North America' , 'Canada') to df.loc[].

Note: We need to provide the full hierarchical index in the form of a tuple in order to access a particular row.

Only providing the label of the child column will result in an error.

 # correct df.loc[('North America' , 'Canada')] # error df.loc['Canada']

MultiIndex from Arrays

We can also create a MultiIndex from an array of arrays using the from_arrays() method.

Let's see an example.

 import pandas as pd # create arrays continent = ['Asia', 'Asia', 'Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'North America', 'North America', 'North America'] country = ['China', 'India', 'Japan', 'Nepal', 'France', 'Germany', 'United Kingdom', 'Canada', 'Mexico', 'United States'] population = [1439323776, 1380004385, 126476461, 29136808, 65273511, 83783942, 67886011, 37742154, 128932753, 331002651] # create array of arrays index_array = [continent, country] 
# create multiindex from array multi_index = pd.MultiIndex.from_arrays(index_array, names=['Continent', 'Country'])
# create dataframe using multiindex df = pd.DataFrame({'Population' :population}, index=multi_index) print(df)

Output

  Population Continent Country Asia China 1439323776 India 1380004385 Japan 126476461 Nepal 29136808 Europe France 65273511 Germany 83783942 United Kingdom 67886011 North America Canada 37742154 Mexico 128932753 United States 331002651

In this example, we created a MultiIndex object named multi_index from two arrays: continent and country.

We then created a DataFrame using the population array and assigned multi_index as its index.

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO
  • Interactive Courses
  • Certificates
  • AI Help
  • 2000+ Challenges