Skip to content

This project delivers a modern data warehouse which focuses on building clean, organized data pipeline which covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics

Notifications You must be signed in to change notification settings

KamanHang/sqldatawarehousedataengineeringproject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

❗Things to Consider❗

The provided dataset was anonymous so I provided a fictional name "Bike Haven Collective" - a company that sells bikes, related acessories and clothing.

SQL Data Warehouse and Data Analytics Project

This project delivers a modern data warehouse which focuses on building clean, organized data pipeline and covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics.

Project Division

This Project focuses into two different sections:

  • Data Engineering
  • Data Analytics and Reporting

Data Engineering 👷🏻‍♂️

In this section of the project I have performed following tasks:

(I have performed the entire task using PL/PostgreSQL)

  • Implemented Medallion Architecture to develop data pipeline for more high quality data flow.
  • Developed ETL Pipeline (Extract, Transform, Load)
  • Ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) data sources.
  • Performed:
    • Data Cleansing tasks (Removing Duplicates, Handling Unwanted Spaces, missing and invalid data, Data Type Casting and Filtering)
    • Data Standardization
    • Data Normalization
    • Data Enrichment
    • Data Integration for Qualitative Data
  • Performed Data Modeling by creating FACTS & DIMENSIONS Table for high quality data analysis in GOLD Layer.

⛩️ Data Architecture

DataArchitecturedrawio

One of the important thing I was exposed during this project is the Medallion Architecture.
Medallion Architecture consist three layers which helped me design and build modular and robust data warehouse.

  • Bronze Layer:

    • In this layer, I have ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) CSV files into PostgreSQL.
  • Silver Layer:

    • In this layer, I have performed data cleansing (Handling Null values, empty spaces), standardization, normalization, enrichment and derived columns tasks.
  • Gold Layer:

    • In this layer, I have created Data Model: Star Schema, in which I have created Fact and Dimension Tables for advance data analytics.

Data LINEAGE (Data Flow)

Note: Final Updated Data Lineage Data Lineage

Data Modeling (Star Schema)

  • STAR SCHEMA

    Star Schema is a multi-dimensional data model for organizing data in a way that makes data analytical tasks easier and helps non technical people easy to understand and get insights from the data.

    • Dimension Table

      • dim_customers
      • dim_products
    • Facts Table

      • fact_sales

For more details check Data Catlog of Gold Layer

StarSchema

Data Analytics and Reporting 📊

  • I have analyzed the sales data for different analysis and created an interactive Power BI dashboard:

PowerBI

About

This project delivers a modern data warehouse which focuses on building clean, organized data pipeline which covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published