In this project, I explored how to predict the probability of Premier League teams winning games in the 2024/25 season, using their 2023/24 results as a baseline. I used Python, the API-Football API, and some light statistics to model each team's win probability.
Let’s break it down 👇
📊 The Goal
Predict how many games each team is likely to win in the 2024/25 season using:
- 🧮 Bernoulli Distribution (win or no win)
- 🎲 Binomial Probability Model
- 📈 Visualizations with Seaborn & Matplotlib
⚙️ Tools & Libraries
import requests import pandas as pd from scipy.stats import binom import matplotlib.pyplot as plt import seaborn as sns
📦 Step 1: Pull 2023/24 Match Data from API-Football
I used the API-Football service to get Premier League match data for the 2023/24 season:
API_KEY = 'your_api_key' BASE_URL = 'https://v3.football.api-sports.io' HEADERS = {'x-apisports-key': API_KEY} params = {'league': 39, 'season': 2023} response = requests.get(f'{BASE_URL}/fixtures', headers=HEADERS, params=params) fixtures = response.json()['response']
🔍 Step 2: Process the Results
Each match was inspected to determine the winning team, and I counted how many matches each team played and won.
data = [] for match in fixtures: if match['fixture']['status']['short'] == 'FT': home_team = match['teams']['home']['name'] away_team = match['teams']['away']['name'] home_goals = match['goals']['home'] away_goals = match['goals']['away'] if home_goals > away_goals: winner = home_team elif away_goals > home_goals: winner = away_team else: winner = None # draw data.append({'home': home_team, 'away': away_team, 'winner': winner}) df = pd.DataFrame(data)
📊 Step 3: Calculate Win Probabilities
I grouped matches by team, calculated win rates (wins / games), and used the Binomial PMF to estimate their chance of winning a given number of games in 38-match season.
teams = list(set(df['home']).union(set(df['away']))) records = [] for team in teams: played = df[(df['home'] == team) | (df['away'] == team)] wins = (df['winner'] == team).sum() win_rate = wins / played.shape[0] records.append({'team': team, 'wins': wins, 'played': played.shape[0], 'win_rate': win_rate}) df_stats = pd.DataFrame(records)
📈 Step 4: Visualize the Prediction
I used Seaborn to create a line plot for each team showing the probability distribution of their possible wins in the next season (assuming 38 games).
season_games = 38 plot_data = [] for _, row in df_stats.iterrows(): team = row['team'] p = row['win_rate'] for x in range(0, season_games + 1): prob = binom.pmf(x, n=season_games, p=p) plot_data.append({'team': team, 'wins': x, 'probability': prob}) viz_df = pd.DataFrame(plot_data) plt.figure(figsize=(14, 8)) sns.lineplot(data=viz_df, x='wins', y='probability', hue='team') plt.title('Predicted Win Probability Distribution (2024/25 Season)') plt.xlabel('Number of Wins') plt.ylabel('Probability') plt.tight_layout() plt.show()
🧠 Why This Matters
- This approach doesn’t predict exact results, but gives a solid probability profile for each team.
- It’s helpful for analysts and fans to understand team performance trends.
- One can improve this model by adding player-level data, home/away effects, injuries, or transfer impact.
💭 Final Thoughts
This was a fun exploration that blended sports and data science. Using historical data with probability theory gives deeper insights than just "gut feeling."
📂 GitHub: [https://github.com/loryneJoy/Python-Assignments.git]
🐍 Tags: #football
#python
#data-science
#premier-league
Top comments (0)