The provided dataset was anonymous so I provided a fictional name "Bike Haven Collective" - a company that sells bikes, related acessories and clothing.
This project delivers a modern data warehouse which focuses on building clean, organized data pipeline and covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics.
This Project focuses into two different sections:
- Data Engineering
- Data Analytics and Reporting
In this section of the project I have performed following tasks:
(I have performed the entire task using PL/PostgreSQL)
- Implemented Medallion Architecture to develop data pipeline for more high quality data flow.
- Developed ETL Pipeline (Extract, Transform, Load)
- Ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) data sources.
- Performed:
- Data Cleansing tasks (Removing Duplicates, Handling Unwanted Spaces, missing and invalid data, Data Type Casting and Filtering)
- Data Standardization
- Data Normalization
- Data Enrichment
- Data Integration for Qualitative Data
- Performed Data Modeling by creating FACTS & DIMENSIONS Table for high quality data analysis in GOLD Layer.
One of the important thing I was exposed during this project is the Medallion Architecture.
Medallion Architecture consist three layers which helped me design and build modular and robust data warehouse.
-
- In this layer, I have ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) CSV files into PostgreSQL.
-
- In this layer, I have performed data cleansing (Handling Null values, empty spaces), standardization, normalization, enrichment and derived columns tasks.
-
- In this layer, I have created Data Model: Star Schema, in which I have created Fact and Dimension Tables for advance data analytics.
Note: Final Updated Data Lineage
-
STAR SCHEMA
Star Schema is a multi-dimensional data model for organizing data in a way that makes data analytical tasks easier and helps non technical people easy to understand and get insights from the data.-
- dim_customers
- dim_products
-
- fact_sales
-
For more details check Data Catlog of Gold Layer
- I have analyzed the sales data for different analysis and created an interactive Power BI dashboard: