Exploring the Titanic Dataset: A Data Analysis Project Using SQL and Power BI

Exploring the Titanic Dataset: A Data Analysis Project Using SQL and Power BI

As a data analyst with a keen interest in historical datasets and their modern implications, I recently undertook a project analyzing the Titanic passenger dataset. This project involved database creation, SQL querying, and data visualization in Power BI to uncover patterns in survival and socio-economic factors aboard the ill-fated ship. The goal was to not only practice technical skills but also derive meaningful insights that could inform contemporary safety practices in the maritime industry.

I'll walk you through the step-by-step process, highlight the importance of each stage, share key objectives, insights, and provide forward-looking recommendations for how companies could enhance operations over the next 2-10 years.


Project Overview

The Titanic disaster of 1912 remains one of history's most poignant tragedies, with over 1,500 lives lost when the "unsinkable" ship struck an iceberg. The dataset used in this project is a cleaned version of the Titanic passenger manifest (train_clean.csv), containing 891 records with variables such as Age, Sex, Pclass (passenger class), Fare, SibSp (siblings/spouses aboard), Parch (parents/children aboard), Survived (0 or 1), Embarked (port of embarkation), Title (extracted from names, e.g., Mr., Mrs.), and Family_Size (SibSp + Parch, excluding self).

The project followed a structured approach: creating a SQL database, importing the data, writing queries to answer specific questions, and visualizing results in Power BI. This mirrors real-world data analysis workflows, where raw data is transformed into insights for decision-making.


Objectives

The primary objectives were:

  1. Technical Proficiency: Demonstrate skills in SQL for data manipulation and Power BI for visualization.
  2. Insight Generation: Uncover demographic and socio-economic factors influencing survival rates.
  3. Storytelling: Craft a narrative around the data to highlight human elements, such as family dynamics and class disparities.
  4. Practical Application: Draw lessons for modern industries, emphasizing how data-driven decisions can prevent future tragedies.
  5. Portfolio Enhancement: Create a professional showcase for LinkedIn, illustrating my ability to handle historical data with contemporary tools.

These objectives ensured the project was not just an exercise but a bridge between past events and future improvements.


Step-by-Step Process

Step 1: Data Preparation and Database Setup

I began by creating a SQL Server database named "Titanic." This step is crucial as it establishes a structured environment for querying, ensuring data integrity and efficient retrieval. Importance: Without a proper database, queries could lead to errors or inefficiencies, especially with larger datasets. Next, I imported the train_clean.csv file into a table named [dbo].[train_clean] using SQL Server Management Studio's import wizard. This process involved mapping columns to appropriate data types (e.g., Age as float, Survived as int) and handling any null values (e.g., in Age or Cabin).

Importance: Data import ensures cleanliness and accessibility. In this case, the dataset was pre-cleaned (e.g., Titles extracted from Names, Family_Size calculated), but verifying for duplicates or inconsistencies is key to reliable analysis. This step took about 15 minutes and set the foundation for accurate querying.

Step 2: SQL Query Development

With the data loaded, I wrote SQL queries to address 10 specific tasks. Each query was designed to extract targeted insights, using techniques like aggregation (AVG, COUNT), conditional logic (CASE), grouping (GROUP BY), and filtering (WHERE). Below, I detail each query, its execution, and why it's important.


Calculate the Average Age of Passengers Who Survived and Those Who Did Not

Article content

Survivors averaged ~28.34 years; non-survivors ~30.63 years. This reveals age-related vulnerabilities, highlighting how younger passengers (potentially more agile) had a slight edge in survival. It's essential for understanding demographic risks in emergencies.


Find the Top 5 Most Common Titles Among Passengers

Article content

Mr. (517), Miss (182), Mrs. (125), Master (40), Dr. (7). Titles reflect social structure and gender distribution, aiding in segmentation analysis. This is vital for targeted insights, like how titles correlate with class or survival.


Calculate the Average Fare Paid by Passengers in Each Pclass

Article content

1st Class (~84.15), 2nd (~20.66), 3rd (~13.68). Illustrates economic disparities, showing how fare (proxy for wealth) ties to amenities and survival odds. Critical for pricing strategy analysis in business contexts.


Retrieve Passengers with the Highest and Lowest Fare

Article content

Highest (~512.33, e.g., Cardeza family); Lowest (0, e.g., crew or complimentary tickets). Identifies outliers, revealing luxury vs. economy dynamics. Useful for anomaly detection in financial data.


Find the Survival Rate for Male and Female Passengers

Article content

Females (74.20% survival, 233/314); Males (18.89%, 109/577). Exposes gender biases in evacuation ("women and children first"), underscoring equity in crisis protocols.


Calculate the Average Family Size for Survivors and Non-Survivors

Article content

Survivors (~1.97); Non-survivors (~1.88). Shows family size's subtle impact on mobility during disasters, informing group-based safety planning.


Retrieve Names of Passengers with Missing Age

Article content

177 passengers (e.g., Moran, Mr. James). Handles missing data, essential for imputation strategies and ensuring complete analysis.


Find the Most Common Port of Embarkation and Passenger Count

Article content

Southampton (644). Reveals geographic trends, useful for logistics and market analysis.


Calculate Average Age by Pclass and Gender

Article content

1st Class Females (~34.61); 3rd Class Males (~26.51). Highlights intersections of class, gender, and age, key for nuanced segmentation.


Retrieve Passengers Traveling Alone Who Did Not Survive

Article content

~163 such passengers (adjusted for Family_Size=0 meaning alone). Identifies isolated individuals' higher risks, emphasizing support systems.


Each query was tested for accuracy, with execution times under 1 second due to the small dataset.


Step 3: Data Visualization in Power BI

I connected the SQL database to Power BI and created dashboards for each query's results. Images like (survival by gender) and (family size comparison) were exported.

Importance: Visualizations make complex data accessible, enabling storytelling. Tableau's interactivity allowed for filters, enhancing exploratory analysis.

Article content
Article content

Key Insights

  • Gender and Class Disparities: Women in 1st class had near-100% survival, vs. 3rd class males at ~13%, showing "privilege" in crises.
  • Age and Family Dynamics: Younger passengers and smaller families fared better, likely due to mobility.
  • Economic Factors: Higher fares correlated with better survival, underscoring resource access.

These insights humanize the data, turning numbers into narratives about inequality.


Recommendations and Future Insights

Drawing from this analysis, modern cruise companies (e.g., a hypothetical "OceanSafe Cruises") can apply lessons to improve safety:

  • Short-Term (2-5 Years): Implement AI-driven evacuation simulations prioritizing vulnerable groups (e.g., elderly, solo travelers). Enhance data collection on passenger demographics for personalized safety briefings. Invest in equitable lifeboat access, addressing class-based disparities.
  • Long-Term (5-10 Years): Integrate IoT sensors for real-time risk assessment (e.g., iceberg detection via satellite). Use predictive analytics on passenger data to forecast needs, reducing missing info issues. Promote inclusivity training to eliminate gender biases. By 2035, aim for 100% survival in simulations through VR training and automated systems.

These steps could reduce risks by 50-70%, based on historical parallels, turning data into lifesaving strategies.


Conclusion

This Titanic project honed my skills in SQL, Power BI, and data storytelling while revealing timeless lessons. It's a testament to how analyzing the past can shape a safer future. I'm eager to apply these techniques in professional roles and connect with me on LinkedIn to discuss collaborations!

Tools Used: SQL Server, Power BI. Dataset Source: Public Titanic repository (cleaned version).

Mahrous Hassan

Data Analyst | Power BI & SQL Expert | Turning Business Data into Actionable Insights

4w

Good

Rajnish Pratap

Freelance Data Analyst | Transform Row Data Into Actionable Insights | Power BI Dashboards | SQL | Python | Data Visualization & Reporting | Excel Dashboard | Data Cleaning

4w

Well done! A neat visualization that delivers strong insights.”

Joshua Salami Peter

B.Sc. Statistics, First Class Honours || Data Scientist || Machine Learning || Deep Learning || Computer Vision || Artificial Intelligence || Researcher

4w

You're doing well Godwill Okwuchukwu

Samim Imtiaz

Area Manager at Samsung Electronics India Pvt Ltd

4w

Well done

Amin Mohamed

Junior Computer Science Student | Aspiring Cybersecurity Specialist | Data Analysis Enthusiast

1mo

Godwill Okwuchukwu, your analysis on the Titanic dataset is amazing! The key insights and recommendations you've shared are incredibly inspiring.

To view or add a comment, sign in

More articles by Godwill Okwuchukwu

Others also viewed

Explore content categories