Pandas Merge

The merge operation in Pandas merges two DataFrames based on their indexes or a specified column.

The merge() in Pandas works similar to JOINs in SQL.

Let's see an example.

 import pandas as pd # create dataframes from the dictionaries data1 = { 'EmployeeID' : ['E001', 'E002', 'E003', 'E004', 'E005'], 'Name' : ['John Doe', 'Jane Smith', 'Peter Brown', 'Tom Johnson', 'Rita Patel'], 'DeptID': ['D001', 'D003', 'D001', 'D002', 'D003'], } employees = pd.DataFrame(data1) data2 = { 'DeptID': ['D001', 'D002', 'D003'], 'DeptName': ['Sales', 'HR', 'Admin'] } departments = pd.DataFrame(data2) # merge dataframes employees and departments merged_df = pd.merge(employees, departments)
 # display DataFrames print("Employees:") print(employees) print() print("Departments:") print(departments) print() print("Merged DataFrame:") print(merged_df)

Output

 Employees: EmployeeID Name DeptID 0 E001 John Doe D001 1 E002 Jane Smith D003 2 E003 Peter Brown D001 3 E004 Tom Johnson D002 4 E005 Rita Patel D003 Departments: DeptID DeptName 0 D001 Sales 1 D002 HR 2 D003 Admin Merged DataFrame: EmployeeID Name DeptID DeptName 0 E001 John Doe D001 Sales 1 E003 Peter Brown D001 Sales 2 E002 Jane Smith D003 Admin 3 E005 Rita Patel D003 Admin 4 E004 Tom Johnson D002 HR

In this example, we merged the DataFrames employees and departments using the merge() method.

Notice that the two DataFrames are merged based on the DeptID column as it's common to both the DataFrames.

merge() Syntax in Pandas

The syntax of the merge() method in Pandas is:

 pd.merge(left, right, on=None, how='inner', left_on=None, right_on=None, sort=False)

Here,

left: specifies the left DataFrame to be merged
right: specifies the right DataFrame to be merged
on (optional): specifies column(s) to join on
how (optional): specifies the type of join to perform
left_on (optional): specifies column(s) from the left DataFrame to use as key(s) for merging
right_on (optional): specifies column(s) from the right DataFrame to use as key(s) for merging
sort (optional): if True, sort the result DataFrame by the join keys

Example: Merge DataFrames Based on Keys

When there are no common columns between two DataFrames, we can merge them by specifying the columns (as keys) in the left_on and right_on arguments. For example,

 import pandas as pd # create dataframes from the dictionaries data1 = { 'EmployeeID': ['E001', 'E002', 'E003', 'E004', 'E005'], 'Name': ['John Doe', 'Jane Smith', 'Peter Brown', 'Tom Johnson', 'Rita Patel'], 'DeptID1': ['D001', 'D003', 'D001', 'D002', 'D006'], } employees = pd.DataFrame(data1) data2 = { 'DeptID2': ['D001', 'D002', 'D003', 'D004'], 'DeptName': ['Sales', 'HR', 'Admin', 'Marketing'] } departments = pd.DataFrame(data2) # merge the dataframes df_merge = pd.merge(employees, departments, left_on='DeptID1', right_on = 'DeptID2', sort = True)
 print(df_merge)

Output

  EmployeeID Name DeptID1 DeptID2 DeptName 0 E001 John Doe D001 D001 Sales 1 E003 Peter Brown D001 D001 Sales 2 E004 Tom Johnson D002 D002 HR 3 E002 Jane Smith D003 D003 Admin

In the above example, we performed a merge operation on two DataFrames employees and departments using the merge() method with various arguments.

Here, we used DeptID1 and DeptID2 as the key for merging the DataFrames. Then, we sorted the resulting DataFrame using sort = True.

Types of Join Operations In merge()

So far, we've not defined how to merge the dataframes, thus it defaults to an inner join.

However, we can specify the join type in the how argument. Here are the 5 join types we can use in the merge() method:

Left Join
Right Join
Outer Join
Inner Join (Default)
Cross Join

Left Join

A left join combines two DataFrames based on a common key and returns a new DataFrame that contains all rows from the left DataFrame and the matched rows from the right DataFrame.

If values are not found in the right dataframe, it fills the space with NaN. For example,

 import pandas as pd # create dataframes from the dictionaries data1 = { 'EmployeeID': ['E001', 'E002', 'E003', 'E004', 'E005'], 'Name': ['John Doe', 'Jane Smith', 'Peter Brown', 'Tom Johnson', 'Rita Patel'], 'DeptID': ['D001', 'D003', 'D001', 'D002', 'D006'], } employees = pd.DataFrame(data1) data2 = { 'DeptID': ['D001', 'D002', 'D003', 'D004'], 'DeptName': ['Sales', 'HR', 'Admin', 'Marketing'] } departments = pd.DataFrame(data2) # left merge the dataframes df_merge = pd.merge(employees, departments, on = 'DeptID', how = 'left', sort = True)
 print(df_merge)

Output

  EmployeeID Name DeptID DeptName 0 E001 John Doe D001 Sales 1 E003 Peter Brown D001 Sales 2 E004 Tom Johnson D002 HR 3 E002 Jane Smith D003 Admin 4 E005 Rita Patel D006 NaN

Right Join

A right join is the opposite of a left join. It returns a new DataFrame that contains all rows from the right DataFrame and the matched rows from the left DataFrame.

If values are not found in the left dataframe, it fills the space with NaN. For example,

 import pandas as pd # create dataframes from the dictionaries data1 = { 'EmployeeID': ['E001', 'E002', 'E003', 'E004', 'E005'], 'Name': ['John Doe', 'Jane Smith', 'Peter Brown', 'Tom Johnson', 'Rita Patel'], 'DeptID': ['D001', 'D003', 'D001', 'D002', 'D006'], } employees = pd.DataFrame(data1) data2 = { 'DeptID': ['D001', 'D002', 'D003', 'D004'], 'DeptName': ['Sales', 'HR', 'Admin', 'Marketing'] } departments = pd.DataFrame(data2) # right merge the dataframes df_merge = pd.merge(employees, departments, on = 'DeptID', how = 'right', sort = True)
 print(df_merge)

Output

  EmployeeID Name DeptID DeptName 0 E001 John Doe D001 Sales 1 E003 Peter Brown D001 Sales 2 E004 Tom Johnson D002 HR 3 E002 Jane Smith D003 Admin 4 NaN NaN D004 Marketing

Inner Join

An inner join combines two DataFrames based on a common key and returns a new DataFrame that contains only rows that have matching values in both of the original DataFrames.

For example,

 import pandas as pd # create dataframes from the dictionaries data1 = { 'EmployeeID': ['E001', 'E002', 'E003', 'E004', 'E005'], 'Name': ['John Doe', 'Jane Smith', 'Peter Brown', 'Tom Johnson', 'Rita Patel'], 'DeptID': ['D001', 'D003', 'D001', 'D002', 'D006'], } employees = pd.DataFrame(data1) data2 = { 'DeptID': ['D001', 'D002', 'D003', 'D004'], 'DeptName': ['Sales', 'HR', 'Admin', 'Marketing'] } departments = pd.DataFrame(data2) # inner merge the dataframes df_merge = pd.merge(employees, departments, on = 'DeptID', how = 'inner', sort = True)
 print(df_merge)

Output

  EmployeeID Name DeptID DeptName 0 E001 John Doe D001 Sales 1 E003 Peter Brown D001 Sales 2 E004 Tom Johnson D002 HR 3 E002 Jane Smith D003 Admin

Outer Join

An outer join combines two DataFrames based on a common key. Unlike an inner join, an outer join returns a new DataFrame that contains all rows from both original DataFrames.

If values are not found in the DataFrames, it fills the space with NaN.

For example,

 import pandas as pd # create dataframes from the dictionaries data1 = { 'EmployeeID': ['E001', 'E002', 'E003', 'E004', 'E005'], 'Name': ['John Doe', 'Jane Smith', 'Peter Brown', 'Tom Johnson', 'Rita Patel'], 'DeptID': ['D001', 'D003', 'D001', 'D002', 'D006'], } employees = pd.DataFrame(data1) data2 = { 'DeptID': ['D001', 'D002', 'D003', 'D004'], 'DeptName': ['Sales', 'HR', 'Admin', 'Marketing'] } departments = pd.DataFrame(data2) # outer merge the dataframes df_merge = pd.merge(employees, departments, on = 'DeptID', how = 'outer', sort = True)
 print(df_merge)

Output

  EmployeeID Name DeptID DeptName 0 E001 John Doe D001 Sales 1 E003 Peter Brown D001 Sales 2 E004 Tom Johnson D002 HR 3 E002 Jane Smith D003 Admin 4 NaN NaN D004 Marketing 5 E005 Rita Patel D006 NaN

Cross Join

A cross join in Pandas creates the cartesian product of both DataFrames while preserving the order of the left DataFrame.

For example,

 import pandas as pd # create dataframes from the dictionaries data1 = { 'EmployeeID': ['E001', 'E002', 'E003', 'E004', 'E005'], 'Name': ['John Doe', 'Jane Smith', 'Peter Brown', 'Tom Johnson', 'Rita Patel'], 'DeptID': ['D001', 'D003', 'D001', 'D002', 'D006'], } employees = pd.DataFrame(data1) data2 = { 'DeptID': ['D001', 'D002', 'D003', 'D004'], 'DeptName': ['Sales', 'HR', 'Admin', 'Marketing'] } departments = pd.DataFrame(data2) # merge the dataframes df_merge = pd.merge(employees, departments, how = 'cross')
 print(df_merge)

Output

  EmployeeID Name DeptID_x DeptID_y DeptName 0 E001 John Doe D001 D001 Sales 1 E001 John Doe D001 D002 HR 2 E001 John Doe D001 D003 Admin 3 E001 John Doe D001 D004 Marketing 4 E002 Jane Smith D003 D001 Sales 5 E002 Jane Smith D003 D002 HR 6 E002 Jane Smith D003 D003 Admin 7 E002 Jane Smith D003 D004 Marketing 8 E003 Peter Brown D001 D001 Sales 9 E003 Peter Brown D001 D002 HR 10 E003 Peter Brown D001 D003 Admin 11 E003 Peter Brown D001 D004 Marketing 12 E004 Tom Johnson D002 D001 Sales 13 E004 Tom Johnson D002 D002 HR 14 E004 Tom Johnson D002 D003 Admin 15 E004 Tom Johnson D002 D004 Marketing 16 E005 Rita Patel D006 D001 Sales 17 E005 Rita Patel D006 D002 HR 18 E005 Rita Patel D006 D003 Admin 19 E005 Rita Patel D006 D004 Marketing

Join vs Merge vs Concat

There are three different methods to combine DataFrames in Pandas:

join(): joins two DataFrames based on their indexes, performs left join by default
merge(): joins two DataFrames based on any specified columns, performs inner join by default
concat(): stacks two DataFrames along the vertical or horizontal axis

Previous Tutorial:

Pandas Read Text File

Next Tutorial:

Pandas Join

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO

Interactive Courses
Certificates
AI Help
2000+ Challenges

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python
programmer.

Popular Tutorials

Reference Materials

Popular Examples

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Pandas Merge

merge() Syntax in Pandas

Example: Merge DataFrames Based on Keys

Types of Join Operations In merge()

Left Join

Right Join

Inner Join

Outer Join

Cross Join

Join vs Merge vs Concat

Table of Contents

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python programmer.

Popular Tutorials

Reference Materials

Popular Examples

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Pandas Merge

merge() Syntax in Pandas

Example: Merge DataFrames Based on Keys

Types of Join Operations In merge()

Left Join

Right Join

Inner Join

Outer Join

Cross Join

Join vs Merge vs Concat

Table of Contents

Become a certified Python
programmer.