Posted on Jul 30

⚽ Predicting 2024/25 Premier League Win Probabilities Using Python

#programming #datascience #analytics #python

In this project, I explored how to predict the probability of Premier League teams winning games in the 2024/25 season, using their 2023/24 results as a baseline. I used Python, the API-Football API, and some light statistics to model each team's win probability.

Let’s break it down 👇

📊 The Goal

Predict how many games each team is likely to win in the 2024/25 season using:

🧮 Bernoulli Distribution (win or no win)
🎲 Binomial Probability Model
📈 Visualizations with Seaborn & Matplotlib

⚙️ Tools & Libraries

import requests import pandas as pd from scipy.stats import binom import matplotlib.pyplot as plt import seaborn as sns

📦 Step 1: Pull 2023/24 Match Data from API-Football

I used the API-Football service to get Premier League match data for the 2023/24 season:

API_KEY = 'your_api_key' BASE_URL = 'https://v3.football.api-sports.io' HEADERS = {'x-apisports-key': API_KEY} params = {'league': 39, 'season': 2023} response = requests.get(f'{BASE_URL}/fixtures', headers=HEADERS, params=params) fixtures = response.json()['response']

🔍 Step 2: Process the Results

Each match was inspected to determine the winning team, and I counted how many matches each team played and won.

data = [] for match in fixtures: if match['fixture']['status']['short'] == 'FT': home_team = match['teams']['home']['name'] away_team = match['teams']['away']['name'] home_goals = match['goals']['home'] away_goals = match['goals']['away'] if home_goals > away_goals: winner = home_team elif away_goals > home_goals: winner = away_team else: winner = None # draw  data.append({'home': home_team, 'away': away_team, 'winner': winner}) df = pd.DataFrame(data)

📊 Step 3: Calculate Win Probabilities

I grouped matches by team, calculated win rates (wins / games), and used the Binomial PMF to estimate their chance of winning a given number of games in 38-match season.

teams = list(set(df['home']).union(set(df['away']))) records = [] for team in teams: played = df[(df['home'] == team) | (df['away'] == team)] wins = (df['winner'] == team).sum() win_rate = wins / played.shape[0] records.append({'team': team, 'wins': wins, 'played': played.shape[0], 'win_rate': win_rate}) df_stats = pd.DataFrame(records)

📈 Step 4: Visualize the Prediction

I used Seaborn to create a line plot for each team showing the probability distribution of their possible wins in the next season (assuming 38 games).

season_games = 38 plot_data = [] for _, row in df_stats.iterrows(): team = row['team'] p = row['win_rate'] for x in range(0, season_games + 1): prob = binom.pmf(x, n=season_games, p=p) plot_data.append({'team': team, 'wins': x, 'probability': prob}) viz_df = pd.DataFrame(plot_data) plt.figure(figsize=(14, 8)) sns.lineplot(data=viz_df, x='wins', y='probability', hue='team') plt.title('Predicted Win Probability Distribution (2024/25 Season)') plt.xlabel('Number of Wins') plt.ylabel('Probability') plt.tight_layout() plt.show()

🧠 Why This Matters

This approach doesn’t predict exact results, but gives a solid probability profile for each team.
It’s helpful for analysts and fans to understand team performance trends.
One can improve this model by adding player-level data, home/away effects, injuries, or transfer impact.

💭 Final Thoughts

This was a fun exploration that blended sports and data science. Using historical data with probability theory gives deeper insights than just "gut feeling."

📂 GitHub: [https://github.com/loryneJoy/Python-Assignments.git]

🐍 Tags: #football #python #data-science #premier-league