Pivot Tables in Pandas

Pivot tables are a useful component of data analysis that can be used to summarize, sort, reorganize, group, count, total, or average data stored in a table. They allow us to transform columns into rows and rows into columns. They can be used to create new, summarized tables out of the original, detailed data.

Pandas provides a function pivot_table that is very handy to create pivot tables.

Here's how to create a basic pivot table in Pandas:

import pandas as pd # Sample data data = { 'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'], 'Type': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40] } df = pd.DataFrame(data) # Create a pivot table pivot_df = pd.pivot_table(df, values='Value', index='Date', columns='Type', aggfunc='sum') print(pivot_df)

The pivot_table arguments used here are:

values: The column to aggregate.
index: The column to make new rows.
columns: The column to make new columns.
aggfunc: The aggregation function to use ('sum', 'mean', 'count', etc. are common functions).

This code will produce:

Type A B Date 2021-01-01 10.0 NaN 2021-01-02 NaN 20.0 2021-01-03 30.0 NaN 2021-01-04 NaN 40.0

If you want to handle missing data and fill it with zeros, you can add the fill_value parameter:

pivot_df = pd.pivot_table(df, values='Value', index='Date', columns='Type', aggfunc='sum', fill_value=0)

For more complex data, you might need to aggregate over multiple columns and use different aggregate functions. In this case, you can pass a dictionary to aggfunc. Here's an example with a multi-level column index and different aggregation functions:

# Sample data with an additional 'Quantity' column data = { 'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'], 'Type': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40, 50, 60, 70, 80], 'Quantity': [1, 2, 3, 4, 5, 6, 7, 8] } df = pd.DataFrame(data) # Create a pivot table with multiple aggregations pivot_df = pd.pivot_table(df, values=['Value', 'Quantity'], index='Date', columns='Type', aggfunc={'Value': 'sum', 'Quantity': 'mean'}) print(pivot_df)

This will produce a pivot table with multi-level columns, where each 'Type' column contains both the sum of 'Value' and the mean of 'Quantity':

 Quantity Value Type A B A B Date 2021-01-01 3.0 NaN 60 NaN 2021-01-02 NaN 4.0 NaN 80 2021-01-03 5.0 NaN 100 NaN 2021-01-04 NaN 6.0 NaN 120

Pivot tables are very flexible and powerful, and by playing with their parameters, you can shape your data in almost any way you need.

More Tags

redraw data-munging css-animations wmi java-platform-module-system fastlane viewaction culture formatexception django-settings

Pivot Tables in Pandas

More Tags

More Programming Guides

Other Guides

More Programming Examples

Fitness Calculators

Auto Calculators

Financial Calculators

Date and Time Calculators

Internet Calculators

Pregnancy Calculators

Investment Calculators

Math Calculators

Housing/Building Calculators

Health Calculators

Retirement Calculators

Statistics Calculators

Various Measurements/Units Calculators

Everyday Utility Calculators

Weather Calculators

Real Estate Calculators

Tax and Salary Calculators

Geometry Calculators

Electronics/Circuits Calculators

Transportation Calculators

Entertainment/Anecdotes Calculators