Pivot Tables in Pandas

Pivot Tables in Pandas

Pivot tables are a useful component of data analysis that can be used to summarize, sort, reorganize, group, count, total, or average data stored in a table. They allow us to transform columns into rows and rows into columns. They can be used to create new, summarized tables out of the original, detailed data.

Pandas provides a function pivot_table that is very handy to create pivot tables.

Here's how to create a basic pivot table in Pandas:

import pandas as pd # Sample data data = { 'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'], 'Type': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40] } df = pd.DataFrame(data) # Create a pivot table pivot_df = pd.pivot_table(df, values='Value', index='Date', columns='Type', aggfunc='sum') print(pivot_df) 

The pivot_table arguments used here are:

  • values: The column to aggregate.
  • index: The column to make new rows.
  • columns: The column to make new columns.
  • aggfunc: The aggregation function to use ('sum', 'mean', 'count', etc. are common functions).

This code will produce:

Type A B Date 2021-01-01 10.0 NaN 2021-01-02 NaN 20.0 2021-01-03 30.0 NaN 2021-01-04 NaN 40.0 

If you want to handle missing data and fill it with zeros, you can add the fill_value parameter:

pivot_df = pd.pivot_table(df, values='Value', index='Date', columns='Type', aggfunc='sum', fill_value=0) 

For more complex data, you might need to aggregate over multiple columns and use different aggregate functions. In this case, you can pass a dictionary to aggfunc. Here's an example with a multi-level column index and different aggregation functions:

# Sample data with an additional 'Quantity' column data = { 'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'], 'Type': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40, 50, 60, 70, 80], 'Quantity': [1, 2, 3, 4, 5, 6, 7, 8] } df = pd.DataFrame(data) # Create a pivot table with multiple aggregations pivot_df = pd.pivot_table(df, values=['Value', 'Quantity'], index='Date', columns='Type', aggfunc={'Value': 'sum', 'Quantity': 'mean'}) print(pivot_df) 

This will produce a pivot table with multi-level columns, where each 'Type' column contains both the sum of 'Value' and the mean of 'Quantity':

 Quantity Value Type A B A B Date 2021-01-01 3.0 NaN 60 NaN 2021-01-02 NaN 4.0 NaN 80 2021-01-03 5.0 NaN 100 NaN 2021-01-04 NaN 6.0 NaN 120 

Pivot tables are very flexible and powerful, and by playing with their parameters, you can shape your data in almost any way you need.


More Tags

redraw data-munging css-animations wmi java-platform-module-system fastlane viewaction culture formatexception django-settings

More Programming Guides

Other Guides

More Programming Examples