DEV Community

Cover image for 📊 The Measures of Central Tendency and Why They Matter in Data Science
Loryne Joy Omwando
Loryne Joy Omwando

Posted on • Edited on

📊 The Measures of Central Tendency and Why They Matter in Data Science

Have you ever stared at a dataset and wondered: “Okay... but what does all this really mean?”
Welcome to the world of central tendency—your first step in summarizing data and making it speak.

Whether you're a growing data analyst or a seasoned data scientist, understanding the core of yone's data starts here.


🧠 What Are Measures of Central Tendency?

In simple terms, measures of central tendency help us find the middle point or typical value in a dataset. These measures include:

  • Mean – the average
  • Median – the middle value
  • Mode – the most frequent value

Think of them like lenses: each one shows the data in a slightly different way.


🧺 Why Are They Important in Data Science?

Raw data can be messy, overwhelming, and misleading without context.

When working with data, especially during exploratory data analysis (EDA), these measures help us:

  • Summarize large datasets with a single number
  • Detect outliers and understand their impact
  • Choose appropriate models (some ML algorithms assume normal distribution)
  • Communicate insights clearly to stakeholders who aren’t tech-savvy

Here are some practical examples 👇


📌 The Mean – "The Classic Average"

import numpy as np salaries = [40000, 45000, 50000, 52000, 60000] mean_salary = np.mean(salaries) print(f"The average salary is: ${mean_salary:.2f}") 
Enter fullscreen mode Exit fullscreen mode

💡 But beware! The mean is sensitive to outliers.

What happens if we introduce a wildly high salary?

salaries.append(200000) # Big CEO bonus! mean_salary = np.mean(salaries) print(f"New average salary: ${mean_salary:.2f}") 
Enter fullscreen mode Exit fullscreen mode

The average gets pulled up, even though most employees earn much less.


📌 The Median – "The Middle Ground"

median_salary = np.median(salaries) print(f"The median salary is: ${median_salary:.2f}") 
Enter fullscreen mode Exit fullscreen mode

The median resists outliers, making it a better choice when the data is skewed.

👈 For example, in real estate prices, income levels, or housing rent, the median gives a fairer picture.


📌 The Mode – "The Most Popular Kid"

from statistics import mode grades = [85, 90, 88, 85, 92, 85, 90] most_common_grade = mode(grades) print(f"The most common grade is: {most_common_grade}") 
Enter fullscreen mode Exit fullscreen mode

The mode is especially useful for categorical data, like:

  • Most purchased product
  • Favorite programming language
  • Most common diagnosis in a hospital dataset

📉 When to Use Which?

Measure Best For Avoid When
Mean Symmetric distributions Data has outliers
Median Skewed data or outliers Uniform distributions
Mode Categorical data Continuous variables with few or no repeats

🔍 Real-Life Use Case: House Prices

Imagine you’re analyzing house prices in Nairobi:

house_prices = [1_000_000, 1_200_000, 1_300_000, 10_000_000] # 👀 big outlier!  print("Mean:", np.mean(house_prices)) print("Median:", np.median(house_prices)) 
Enter fullscreen mode Exit fullscreen mode

Which one would you trust more to describe a "typical" house price?
Definitely the median—because that luxury mansion isn't your average listing.


🧠 Final Thoughts

Mastering central tendency is more than just memorizing formulas.

It’s about knowing which tool to use, when to use it, and why. Data Science isn't just about models and code—it's about context and communication.

So next if handed a CSV file full of numbers, don’t panic. It's important to:

Start with the basics.
Start with central tendency.


✅ TL;DR

  • Mean = average (useful, but sensitive to outliers)
  • Median = middle value (great for skewed data)
  • Mode = most frequent value (perfect for categories)
  • Use them in EDA, data summaries, and to build intuition

Thanks for reading! 🙌
If you found this helpful, let’s connect or discuss below:
What’s your go-to measure when you explore new data?


🏰 Dev.to Metadata

Tags:

data-science python statistics beginners eda machine-learning 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)