DEV Community

Cover image for πŸ“Š Predicting Stock Price Movement Using Options Data in Python
Dmitry Romanoff
Dmitry Romanoff

Posted on

πŸ“Š Predicting Stock Price Movement Using Options Data in Python

Have you ever wanted to use real stock options data to estimate a stock's expected move and find the "max pain" point for option sellers? In this post, we'll walk through a Python script that does just that β€” using basic options CSV data and some neat tricks with pandas and numpy.


🧠 What's the Idea?

Options trading isn't just speculation β€” it also contains valuable information about market expectations. Using open interest and implied volatility, we can estimate:

  • The expected price range of a stock by expiration.
  • The Max Pain strike β€” the price where most option holders (especially retail) lose money.

🧾 What You'll Need

  • Two CSV files: calls.csv and puts.csv, each space-delimited.
  • Python 3 with pandas and numpy installed.

🧬 Sample Input Data

🟦 calls.csv (truncated)

Contract Name Last Trade Date (EDT) Strike Last Price Bid Ask Change % Change Volume Open Interest Implied Volatility LLY250815C00350000 ... 350 410.00 388.40 395.85 ... 1 1 98.14% LLY250815C00400000 ... 400 343.46 351.85 356.90 ... 112 58 114.91% 
Enter fullscreen mode Exit fullscreen mode

πŸŸ₯ puts.csv (truncated)

Contract Name Last Trade Date (EDT) Strike Last Price Bid Ask Change % Change Volume Open Interest Implied Volatility LLY250815P00350000 ... 350 1.51 0.03 1.80 ... 1 31 75.27% LLY250815P00400000 ... 400 1.02 0.32 2.00 ... 1 68 65.16% 
Enter fullscreen mode Exit fullscreen mode

πŸ§‘β€πŸ’» Python Code Walkthrough

import pandas as pd import numpy as np def clean_options_data(filepath): # Automatically infer whitespace columns and clean headers  df = pd.read_csv(filepath, sep=None, engine='python') df.columns = df.columns.str.strip() df.replace('-', np.nan, inplace=True) # Process Implied Volatility  if 'Implied Volatility' not in df.columns: print("❌ 'Implied Volatility' column not found. Check headers.") return None df['Implied Volatility'] = df['Implied Volatility'].str.replace('%', '', regex=False) df['Implied Volatility'] = pd.to_numeric(df['Implied Volatility'], errors='coerce') / 100 df['Strike'] = pd.to_numeric(df['Strike'], errors='coerce') df['Open Interest'] = pd.to_numeric(df['Open Interest'], errors='coerce') df.dropna(subset=['Strike', 'Open Interest', 'Implied Volatility'], inplace=True) return df 
Enter fullscreen mode Exit fullscreen mode

This function loads and sanitizes the options data, ensuring percentages and missing values are properly handled.


πŸ“ˆ Estimate Expected Move

calls = clean_options_data("calls.csv") puts = clean_options_data("puts.csv") # Exit on error if calls is None or puts is None: print("Fix your CSV headers and retry.") exit() # ATM (At-the-Money) strike based on highest call open interest atm_strike = calls.loc[calls['Open Interest'].idxmax(), 'Strike'] atm_iv = calls.loc[calls['Strike'] == atm_strike, 'Implied Volatility'].mean() days_to_expiry = 75 # Use actual days to expiry  expected_move = atm_strike * atm_iv * np.sqrt(days_to_expiry / 365) print(f"\n🧠 Estimated Stock Price: ${atm_strike:.2f}") print(f"πŸ“ˆ Expected Β± Move in {days_to_expiry} days: ${expected_move:.2f}") print(f"πŸ” Price Range: ${atm_strike - expected_move:.2f} to ${atm_strike + expected_move:.2f}") 
Enter fullscreen mode Exit fullscreen mode

🎯 Calculate Max Pain

The Max Pain price is where the total losses (for put and call holders) are minimized β€” this is often where market makers benefit most.

def max_pain(calls_df, puts_df): strikes = sorted(set(calls_df['Strike']).union(set(puts_df['Strike']))) total_pain = [] for strike in strikes: call_pain = ((calls_df['Strike'] - strike).clip(lower=0) * calls_df['Open Interest']).sum() put_pain = ((strike - puts_df['Strike']).clip(lower=0) * puts_df['Open Interest']).sum() total_pain.append((strike, call_pain + put_pain)) pain_df = pd.DataFrame(total_pain, columns=['Strike', 'Total Pain']) return pain_df.loc[pain_df['Total Pain'].idxmin(), 'Strike'] max_pain_strike = max_pain(calls, puts) print(f"\n🎯 Max Pain Strike: ${max_pain_strike:.2f}") 
Enter fullscreen mode Exit fullscreen mode

🧾 Output Example

βœ… Parsed headers: ['Contract Name', ..., 'Implied Volatility'] 🧠 Estimated Stock Price: $400.00 πŸ“ˆ Expected Β± Move in 75 days: $49.73 πŸ” Price Range: $350.27 to $449.73 🎯 Max Pain Strike: $390.00 
Enter fullscreen mode Exit fullscreen mode

πŸ” Takeaways

  • You can estimate market expectations using just open interest and implied volatility.
  • Max Pain is a powerful idea and sometimes a magnet for stock price behavior near expiry.
  • pandas makes it incredibly easy to clean and analyze tabular data β€” even when messy.

πŸš€ What’s Next?

You could extend this to:

  • Plot open interest by strike.
  • Animate price move cones.
  • Combine with real-time price feeds using yfinance or alphavantage.

πŸ“‚ Repo Starter

Create a folder like:

/stocks_b β”œβ”€ calls.csv β”œβ”€ puts.csv └─ analyze_options.py 
Enter fullscreen mode Exit fullscreen mode

Then run:

python analyze_options.py 
Enter fullscreen mode Exit fullscreen mode

Predicting Stock Price Movement Using Options Data in Python

Full Code

import pandas as pd import numpy as np def clean_options_data(filepath): # Use Python engine to preserve multi-word headers like "Implied Volatility"  df = pd.read_csv(filepath, sep=None, engine='python') # Strip any extra whitespace in headers  df.columns = df.columns.str.strip() # Optional: print to confirm correct headers  print("βœ… Parsed headers:", df.columns.tolist()) # Replace '-' with NaN  df.replace('-', np.nan, inplace=True) # Clean Implied Volatility  if 'Implied Volatility' not in df.columns: print("❌ 'Implied Volatility' column not found. Check headers again.") return None df['Implied Volatility'] = df['Implied Volatility'].str.replace('%', '', regex=False) df['Implied Volatility'] = pd.to_numeric(df['Implied Volatility'], errors='coerce') / 100 # Convert other numeric columns  df['Strike'] = pd.to_numeric(df['Strike'], errors='coerce') df['Open Interest'] = pd.to_numeric(df['Open Interest'], errors='coerce') # Drop rows missing key values  df.dropna(subset=['Strike', 'Open Interest', 'Implied Volatility'], inplace=True) return df # Load data calls = clean_options_data("calls.csv") puts = clean_options_data("puts.csv") # Stop if headers are invalid if calls is None or puts is None: print("Fix your CSV headers and retry.") exit() # Estimate ATM strike and IV atm_strike = calls.loc[calls['Open Interest'].idxmax(), 'Strike'] atm_iv = calls.loc[calls['Strike'] == atm_strike, 'Implied Volatility'].mean() # Estimate move days_to_expiry = 75 expected_move = atm_strike * atm_iv * np.sqrt(days_to_expiry / 365) print(f"\n🧠 Estimated Stock Price: ${atm_strike:.2f}") print(f"πŸ“ˆ Expected Β± Move in {days_to_expiry} days: ${expected_move:.2f}") print(f"πŸ” Price Range: ${atm_strike - expected_move:.2f} to ${atm_strike + expected_move:.2f}") # Max Pain def max_pain(calls_df, puts_df): strikes = sorted(set(calls_df['Strike']).union(set(puts_df['Strike']))) total_pain = [] for strike in strikes: call_pain = ((calls_df['Strike'] - strike).clip(lower=0) * calls_df['Open Interest']).sum() put_pain = ((strike - puts_df['Strike']).clip(lower=0) * puts_df['Open Interest']).sum() total_pain.append((strike, call_pain + put_pain)) pain_df = pd.DataFrame(total_pain, columns=['Strike', 'Total Pain']) return pain_df.loc[pain_df['Total Pain'].idxmin(), 'Strike'] max_pain_strike = max_pain(calls, puts) print(f"\n🎯 Max Pain Strike: ${max_pain_strike:.2f}") 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)