DEV Community

Bahman Shadmehr
Bahman Shadmehr

Posted on

Analyzing Financial Data with Pandas: A Step-by-Step Guide

In the world of data analysis, Pandas stands out as a powerful tool for working with structured data. If you're interested in diving into financial data and extracting meaningful insights, this guide will walk you through the process of grouping and aggregating data using Pandas, with a practical example using historical stock data from Yahoo Finance.

Step 1: Install Necessary Libraries

Before we get started, make sure you have Pandas and yfinance installed. If not, you can install them by running:

pip install pandas yfinance 
Enter fullscreen mode Exit fullscreen mode

Step 2: Import Libraries

import pandas as pd import yfinance as yf 
Enter fullscreen mode Exit fullscreen mode

Step 3: Download Financial Data

Let's start by downloading historical stock data for a specific ticker. For this example, we'll use Apple Inc. (AAPL).

ticker = "AAPL" start_date = "2020-01-01" end_date = "2023-01-01" # Download data data = yf.download(ticker, start=start_date, end=end_date) 
Enter fullscreen mode Exit fullscreen mode

Step 4: Explore and Clean the Data

It's crucial to understand and clean your data before diving into analysis.

# Display the first few rows of the data print(data.head()) # Check for missing values print(data.isnull().sum()) # Drop rows with missing values data = data.dropna() 
Enter fullscreen mode Exit fullscreen mode

Step 5: Group Data and Perform Aggregations

Grouping by Date

# Group by date and calculate the average closing price for each day daily_average = data.groupby(data.index)['Close'].mean() # Display the result print(daily_average) 
Enter fullscreen mode Exit fullscreen mode

Grouping by Month

# Extract month from the Date index data['Month'] = data.index.month # Group by month and calculate the average closing price for each month monthly_average = data.groupby('Month')['Close'].mean() # Display the result print(monthly_average) 
Enter fullscreen mode Exit fullscreen mode

Grouping by Multiple Columns

# Extract month and year from the Date index data['Month'] = data.index.month data['Year'] = data.index.year # Group by year and month, calculate the average closing price for each group monthly_average_by_year = data.groupby(['Year', 'Month'])['Close'].mean() # Display the result print(monthly_average_by_year) 
Enter fullscreen mode Exit fullscreen mode

Step 6: Advanced Aggregations

You can perform more advanced aggregations using the agg function.

# Define a dictionary with column-wise aggregation functions agg_functions = {'Open': 'mean', 'High': 'max', 'Low': 'min', 'Close': 'mean', 'Volume': 'sum'} # Group by month and apply the aggregation functions monthly_aggregated = data.groupby('Month').agg(agg_functions) # Display the result print(monthly_aggregated) 
Enter fullscreen mode Exit fullscreen mode

Conclusion

Congratulations! You've just learned how to download financial data from Yahoo Finance, clean and explore it, and perform various grouping and aggregation operations using Pandas. These skills are fundamental for anyone looking to analyze and derive insights from financial data.

Feel free to adapt and expand upon these examples based on your specific analysis goals. Happy coding!

Top comments (2)

Collapse
 
topofocus profile image
Hartmut B.

May i ask, why pandas is still used. I thought the rust-based polars is the common tool.

Collapse
 
bshadmehr profile image
Bahman Shadmehr

I think Pandas is still a widely used tool. But at the end we just need to remember that all tools like pandas and polars are tools and it's up to us to use them in a way that helps us to achieve our specific goal.