Posted on Jun 1

📊 Predicting Stock Price Movement Using Options Data in Python

Have you ever wanted to use real stock options data to estimate a stock's expected move and find the "max pain" point for option sellers? In this post, we'll walk through a Python script that does just that — using basic options CSV data and some neat tricks with pandas and numpy.

🧠 What's the Idea?

Options trading isn't just speculation — it also contains valuable information about market expectations. Using open interest and implied volatility, we can estimate:

The expected price range of a stock by expiration.
The Max Pain strike — the price where most option holders (especially retail) lose money.

🧾 What You'll Need

Two CSV files: calls.csv and puts.csv, each space-delimited.
Python 3 with pandas and numpy installed.

🧬 Sample Input Data

🟦 `calls.csv` (truncated)

Contract Name Last Trade Date (EDT) Strike Last Price Bid Ask Change % Change Volume Open Interest Implied Volatility LLY250815C00350000 ... 350 410.00 388.40 395.85 ... 1 1 98.14% LLY250815C00400000 ... 400 343.46 351.85 356.90 ... 112 58 114.91%

🟥 `puts.csv` (truncated)

Contract Name Last Trade Date (EDT) Strike Last Price Bid Ask Change % Change Volume Open Interest Implied Volatility LLY250815P00350000 ... 350 1.51 0.03 1.80 ... 1 31 75.27% LLY250815P00400000 ... 400 1.02 0.32 2.00 ... 1 68 65.16%

🧑‍💻 Python Code Walkthrough

import pandas as pd import numpy as np def clean_options_data(filepath): # Automatically infer whitespace columns and clean headers  df = pd.read_csv(filepath, sep=None, engine='python') df.columns = df.columns.str.strip() df.replace('-', np.nan, inplace=True) # Process Implied Volatility  if 'Implied Volatility' not in df.columns: print("❌ 'Implied Volatility' column not found. Check headers.") return None df['Implied Volatility'] = df['Implied Volatility'].str.replace('%', '', regex=False) df['Implied Volatility'] = pd.to_numeric(df['Implied Volatility'], errors='coerce') / 100 df['Strike'] = pd.to_numeric(df['Strike'], errors='coerce') df['Open Interest'] = pd.to_numeric(df['Open Interest'], errors='coerce') df.dropna(subset=['Strike', 'Open Interest', 'Implied Volatility'], inplace=True) return df

This function loads and sanitizes the options data, ensuring percentages and missing values are properly handled.

📈 Estimate Expected Move

calls = clean_options_data("calls.csv") puts = clean_options_data("puts.csv") # Exit on error if calls is None or puts is None: print("Fix your CSV headers and retry.") exit() # ATM (At-the-Money) strike based on highest call open interest atm_strike = calls.loc[calls['Open Interest'].idxmax(), 'Strike'] atm_iv = calls.loc[calls['Strike'] == atm_strike, 'Implied Volatility'].mean() days_to_expiry = 75 # Use actual days to expiry  expected_move = atm_strike * atm_iv * np.sqrt(days_to_expiry / 365) print(f"\n🧠 Estimated Stock Price: ${atm_strike:.2f}") print(f"📈 Expected ± Move in {days_to_expiry} days: ${expected_move:.2f}") print(f"🔍 Price Range: ${atm_strike - expected_move:.2f} to ${atm_strike + expected_move:.2f}")

🎯 Calculate Max Pain

The Max Pain price is where the total losses (for put and call holders) are minimized — this is often where market makers benefit most.

def max_pain(calls_df, puts_df): strikes = sorted(set(calls_df['Strike']).union(set(puts_df['Strike']))) total_pain = [] for strike in strikes: call_pain = ((calls_df['Strike'] - strike).clip(lower=0) * calls_df['Open Interest']).sum() put_pain = ((strike - puts_df['Strike']).clip(lower=0) * puts_df['Open Interest']).sum() total_pain.append((strike, call_pain + put_pain)) pain_df = pd.DataFrame(total_pain, columns=['Strike', 'Total Pain']) return pain_df.loc[pain_df['Total Pain'].idxmin(), 'Strike'] max_pain_strike = max_pain(calls, puts) print(f"\n🎯 Max Pain Strike: ${max_pain_strike:.2f}")

🧾 Output Example

✅ Parsed headers: ['Contract Name', ..., 'Implied Volatility'] 🧠 Estimated Stock Price: $400.00 📈 Expected ± Move in 75 days: $49.73 🔍 Price Range: $350.27 to $449.73 🎯 Max Pain Strike: $390.00

🔍 Takeaways

You can estimate market expectations using just open interest and implied volatility.
Max Pain is a powerful idea and sometimes a magnet for stock price behavior near expiry.
pandas makes it incredibly easy to clean and analyze tabular data — even when messy.

🚀 What’s Next?

You could extend this to:

Plot open interest by strike.
Animate price move cones.
Combine with real-time price feeds using yfinance or alphavantage.

📂 Repo Starter

Create a folder like:

/stocks_b ├─ calls.csv ├─ puts.csv └─ analyze_options.py

Then run:

python analyze_options.py

Full Code

import pandas as pd import numpy as np def clean_options_data(filepath): # Use Python engine to preserve multi-word headers like "Implied Volatility"  df = pd.read_csv(filepath, sep=None, engine='python') # Strip any extra whitespace in headers  df.columns = df.columns.str.strip() # Optional: print to confirm correct headers  print("✅ Parsed headers:", df.columns.tolist()) # Replace '-' with NaN  df.replace('-', np.nan, inplace=True) # Clean Implied Volatility  if 'Implied Volatility' not in df.columns: print("❌ 'Implied Volatility' column not found. Check headers again.") return None df['Implied Volatility'] = df['Implied Volatility'].str.replace('%', '', regex=False) df['Implied Volatility'] = pd.to_numeric(df['Implied Volatility'], errors='coerce') / 100 # Convert other numeric columns  df['Strike'] = pd.to_numeric(df['Strike'], errors='coerce') df['Open Interest'] = pd.to_numeric(df['Open Interest'], errors='coerce') # Drop rows missing key values  df.dropna(subset=['Strike', 'Open Interest', 'Implied Volatility'], inplace=True) return df # Load data calls = clean_options_data("calls.csv") puts = clean_options_data("puts.csv") # Stop if headers are invalid if calls is None or puts is None: print("Fix your CSV headers and retry.") exit() # Estimate ATM strike and IV atm_strike = calls.loc[calls['Open Interest'].idxmax(), 'Strike'] atm_iv = calls.loc[calls['Strike'] == atm_strike, 'Implied Volatility'].mean() # Estimate move days_to_expiry = 75 expected_move = atm_strike * atm_iv * np.sqrt(days_to_expiry / 365) print(f"\n🧠 Estimated Stock Price: ${atm_strike:.2f}") print(f"📈 Expected ± Move in {days_to_expiry} days: ${expected_move:.2f}") print(f"🔍 Price Range: ${atm_strike - expected_move:.2f} to ${atm_strike + expected_move:.2f}") # Max Pain def max_pain(calls_df, puts_df): strikes = sorted(set(calls_df['Strike']).union(set(puts_df['Strike']))) total_pain = [] for strike in strikes: call_pain = ((calls_df['Strike'] - strike).clip(lower=0) * calls_df['Open Interest']).sum() put_pain = ((strike - puts_df['Strike']).clip(lower=0) * puts_df['Open Interest']).sum() total_pain.append((strike, call_pain + put_pain)) pain_df = pd.DataFrame(total_pain, columns=['Strike', 'Total Pain']) return pain_df.loc[pain_df['Total Pain'].idxmin(), 'Strike'] max_pain_strike = max_pain(calls, puts) print(f"\n🎯 Max Pain Strike: ${max_pain_strike:.2f}")

DEV Community

📊 Predicting Stock Price Movement Using Options Data in Python

🧠 What's the Idea?

🧾 What You'll Need

🧬 Sample Input Data

🟦 `calls.csv` (truncated)

🟥 `puts.csv` (truncated)

🧑‍💻 Python Code Walkthrough

📈 Estimate Expected Move

🎯 Calculate Max Pain

🧾 Output Example

🔍 Takeaways

🚀 What’s Next?

📂 Repo Starter

Full Code

Top comments (0)

🧠 What's the Idea?

🧾 What You'll Need

🧬 Sample Input Data

🟦 calls.csv (truncated)

🟥 puts.csv (truncated)

🧑‍💻 Python Code Walkthrough

📈 Estimate Expected Move

🎯 Calculate Max Pain

🧾 Output Example

🔍 Takeaways

🚀 What’s Next?

📂 Repo Starter

Full Code

🟦 `calls.csv` (truncated)

🟥 `puts.csv` (truncated)