A focused 2-day roadmap to master Seaborn for data analytics using the tips dataset.
This guide includes setup, explanations, code examples, pitfalls, checkpoints, and a mini-project.
By the end, you’ll have job-ready visualization skills.
Source Code: Click here
Written By: Nivesh Bansal Linkedin GitHub Instagram
Seaborn Roadmap for Data Analytics (Using tips)
Goal: Master Seaborn’s essentials in 2 days (or one power-day).
This roadmap covers every necessary topic with explanations, code snippets, and practice tasks—focused purely on practical analytics.
Key Outcomes:
- Dataset:
sns.load_dataset("tips") - Focus: EDA • Storytelling • Clean visuals
- Result: Job-ready plotting skills
Tip: Don’t memorize. For each topic:
- Run the example
- Tweak 2–3 parameters
- Write one insight in plain English
Requirements:
- Python ≥ 3.9
- Libraries: pandas, numpy, matplotlib, seaborn
- IDE: Jupyter/Colab or any Python IDE
0) Quick Setup
pip install seaborn matplotlib pandas numpy import seaborn as sns import matplotlib.pyplot as plt import pandas as pd sns.set_theme(style="whitegrid", context="notebook") tips = sns.load_dataset("tips") tips.head() Dataset Columns:
total_bill, tip, sex, smoker, day, time, size
📅 Day 1 — Foundations & Core EDA
Goal: Understand Seaborn’s API, explore distributions, compare categories, and scan pairwise relationships quickly.
1) Seaborn Basics: Figure-level vs Axes-level
- Axes-level → e.g.,
sns.scatterplot(draws on Matplotlib Axes, returnsAxes) - Figure-level → e.g.,
sns.catplot,sns.pairplot(manages own figure/layout) - Common params:
data=,x=,y=,hue=,style=,size=
2) Univariate Distributions
Use these to understand shape, center, spread, and outliers:
-
histplot— histogram (+ KDE option) -
kdeplot— kernel density estimate -
ecdfplot— empirical CDF (great for medians & quantiles) -
countplot— frequency for categorical variables
# Histogram & KDE sns.histplot(tips, x="total_bill", bins=20, kde=True) plt.title("Distribution of Total Bill") plt.show() # ECDF sns.ecdfplot(tips, x="tip") plt.title("ECDF of Tip") plt.show() # Count sns.countplot(data=tips, x="day") plt.title("Count by Day") plt.show() When to use: sanity checks, skewness, choosing transforms, spotting outliers.
3) Categorical ↔ Numerical
Compare distributions across groups:
-
boxplot— median, IQR, whiskers, outliers -
violinplot— full distribution via KDE -
boxenplot— for large samples -
stripplot/swarmplot— raw points -
barplot/pointplot— aggregated means/CI
# Box vs Violin fig, ax = plt.subplots(1,2, figsize=(10,4)) sns.boxplot(data=tips, x="day", y="total_bill", ax=ax[0]) sns.violinplot(data=tips, x="day", y="tip", ax=ax[1]) ax[0].set_title("Total Bill by Day") ax[1].set_title("Tip by Day") plt.tight_layout(); plt.show() # Strip plot sns.stripplot(data=tips, x="smoker", y="tip", jitter=True) plt.title("Raw Tips by Smoker") plt.show() # Mean with CI sns.barplot(data=tips, x="sex", y="tip", estimator=pd.Series.mean, ci=95) plt.title("Avg Tip by Sex") plt.show() Combine violinplot + stripplot for distributions + raw data.
4) Numeric ↔ Numeric Relationships
Start with scatterplots, optionally add regression.
# Basic scatter sns.scatterplot(data=tips, x="total_bill", y="tip") plt.title("Total Bill vs Tip") plt.show() # Add hue/style/size sns.scatterplot(data=tips, x="total_bill", y="tip", hue="sex", style="smoker", size="size") plt.title("Bill vs Tip by Sex/Smoker/Size") plt.show() # Trend line sns.regplot(data=tips, x="total_bill", y="tip", scatter_kws={"alpha":0.6}) plt.title("Trend: Tip ~ Total Bill") plt.show() 5) Fast Pairwise Scans
sns.pairplot(tips, hue="sex", diag_kind="hist") plt.suptitle("Pairwise Relationships (tips)", y=1.02) plt.show() ✅ Checkpoint (Day 1 done): You can read distributions, compare groups, and see pairwise trends. Write 3 insights from the dataset.
📅 Day 2 — Multivariate, Facets, Correlations & Pro Styling
Goal: Add faceting, correlations, palettes, and create presentation-ready visuals.
6) Faceting & Small Multiples
Split data into subplots by category.
# Facet by smoker sns.catplot(data=tips, x="day", y="tip", hue="sex", col="smoker", kind="bar") plt.suptitle("Tips by Day (faceted by Smoker)", y=1.02) plt.show() # Scatter with facets sns.relplot(data=tips, x="total_bill", y="tip", hue="sex", col="time", kind="scatter") plt.show() Facets make comparisons obvious without clutter.
7) Correlations & Heatmaps
corr = tips[["total_bill","tip","size"]].corr() sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", square=True) plt.title("Correlation (tips)") plt.show() # Clustered heatmap sns.clustermap(corr, annot=True, fmt=".2f", cmap="coolwarm") plt.show() Read values: ±1 → strong linear relation, 0 → weak/none.
8) Time/Ordered Trends
avg = tips.groupby("size", as_index=False)["tip"].mean() sns.lineplot(data=avg, x="size", y="tip") plt.title("Average Tip by Party Size") plt.show() 9) Styling, Palettes & Layout
sns.set_theme(style="whitegrid", context="talk", palette="deep") ax = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="sex") ax.set_title("Tips vs Total Bill") ax.set_xlabel("Total Bill ($)") ax.set_ylabel("Tip ($)") sns.despine() plt.tight_layout(); plt.show() 10) Legends, Annotations & Saving
ax = sns.regplot(data=tips, x="total_bill", y="tip") ax.annotate("Higher tips with higher bills", xy=(40,7), xytext=(25,8.5), arrowprops=dict(arrowstyle="->", color="white")) ax.legend_.remove() if ax.legend_ else None plt.tight_layout() plt.savefig("tips_scatter.png", dpi=300, bbox_inches="tight", transparent=True) plt.show() 11) Cheat-Sheet: Axes vs Figure Level
Axes-level:
scatterplot,lineplot,histplot,kdeplot,boxplot,violinplot,heatmap,regplot…
Use when you manage subplots manually.Figure-level:
relplot,catplot,jointplot,pairplot,lmplot…
Use for quick grids/facets and auto layouts.
12) Common Pitfalls
- Overplotting → use alpha,
hexbin, orkdeplot - Don’t rely on defaults → always set titles/labels
- For groups → prefer violin/box + strip over bar means
- Keep consistent color semantics
✅ Checkpoint (Day 2 done): You can facet, compare multivariate trends, style for clarity, and export.
Mini-Project (Deliverable)
Question: What factors drive higher tips?
Steps:
- Univariate: distribution of
total_bill,tip - Groups:
tipbyday,sex,smoker,time - Relationship:
total_bill↔tip(add hue & regression) - Correlation heatmap for numeric vars
- Facet by smoker/time
- Report: 5 insights + 2 charts for LinkedIn/portfolio
import numpy as np tips = sns.load_dataset("tips").assign(tip_pct=lambda d: d["tip"] / d["total_bill"] * 100) # 1) Distribution sns.histplot(tips, x="tip_pct", bins=20, kde=True) plt.title("Tip % Distribution"); plt.show() # 2) Groups sns.boxplot(tips, x="day", y="tip_pct", hue="smoker") plt.title("Tip % by Day & Smoker"); plt.show() # 3) Relationship with hue sns.scatterplot(tips, x="total_bill", y="tip_pct", hue="time", style="sex") plt.title("Tip % vs Total Bill by Time/Sex"); plt.show() # 4) Correlation num = tips[["total_bill","tip","size","tip_pct"]] sns.heatmap(num.corr(), annot=True, fmt=".2f", cmap="coolwarm") plt.title("Correlation (with Tip %)"); plt.show() Practice Checklist
- Plot hist+KDE for
total_bill; describe skewness - Compare
tipacrossdayusing box+strip - Scatter
total_billvstipwithhue=sex,style=smoker - Create
pairplotwithhue=time - Build correlation heatmap; write 2 interpretations
- Facet bar chart by
smokerandtime - Export one figure at 300 DPI with transparent background
Quick Reference
Most-used APIs:
-
scatterplot,lineplot,histplot,kdeplot,ecdfplot -
boxplot,violinplot,stripplot,barplot,pointplot -
pairplot,jointplot,relplot,catplot -
heatmap,clustermap,regplot
Styling:
sns.set_theme(style, palette, context)-
sns.despine(),plt.tight_layout() - Palettes:
deep,muted,pastel,bright,dark,colorblind
Written for: Nivesh Bansal — Data Analytics Journey, Day 10.
You can copy any code block and practice directly.
Happy plotting!
Source Code: Click here
Written By: Nivesh Bansal Linkedin GitHub Instagram
Top comments (0)