This project demonstrates the end-to-end design of a Data Warehouse using SQL Server (T-SQL).
It integrates CRM (Customer/Sales) and ERP (Enterprise Resource Planning) data into a unified repository through a structured ETL pipeline.
The solution highlights:
- ETL Process (Extract, Transform, Load)
- Layered Architecture (Bronze → Silver → Gold)
- Dimensional Modeling (Star Schema with Facts and Dimensions)
- Data Quality Checks for reliability
The project is designed for general audiences, data enthusiasts, and recruiters who want to see a practical demonstration of data engineering workflows.
The warehouse follows a Medallion Architecture:
-
Bronze Layer (Raw Data)
- Ingests raw CSV data into staging tables.
- Data is preserved exactly as received.
-
Silver Layer (Cleaned & Standardized Data)
- Applies data cleaning, normalization, and deduplication.
- Prepares conformed tables for analytics.
-
Gold Layer (Dimensional Model)
- Implements a Star Schema with Fact and Dimension Views.
- Optimized for reporting and BI tools.
Contains raw CSV files from CRM and ERP systems.
-
source_crm/cust_info.csv→ Customer informationprd_info.csv→ Product informationsales_details.csv→ Sales details
-
source_erp/cust_az12.csv→ Customer IDs and birthdatesloc_a101.csv→ Customer locationspx_cat_g1v2.csv→ Product categories
Contains all SQL scripts for building and running the warehouse.
creating_database.sql→ Creates theDataWareHousedatabase and schemas (bronze,silver,gold).
ddl_bronze.sql→ Creates raw staging tables matching CSV schema.proc_load_bronze.sql→ Stored procedure to load CSV files viaBULK INSERT.execute_bronze.sql→ Runs the Bronze ingestion procedure.
ddl_silver.sql→ Defines cleaned/conformed Silver tables.proc_load_silver.sql→ Cleans and transforms data from Bronze into Silver.execute_silver.sql→ Executes Silver loading procedure.
ddl_gold.sql→ Creates analytical star-schema views:gold.dim_customersgold.dim_productsgold.fact_sales
Data validation SQL scripts.
quality_checks_silver.sql→ Ensures Silver data has no duplicates, invalid values, or missing keys.quality_checks_gold.sql→ Validates uniqueness of surrogate keys and referential integrity in the Gold layer.
-
Extract → Load (Bronze Layer)
- Raw CSVs are loaded into
bronze.*tables usingBULK INSERT. - Preserves source formatting and schema.
- Raw CSVs are loaded into
-
Transform (Silver Layer)
- Data is cleansed, normalized, and standardized:
- Trims whitespace
- Normalizes gender codes
- Handles nulls and invalid dates
- Maps product line codes to descriptive names
- Ensures consistent, business-ready data.
- Data is cleansed, normalized, and standardized:
-
Load & Model (Gold Layer)
- Builds a Star Schema with:
- Dimensions: Customers & Products
- Fact: Sales transactions
- Surrogate keys generated via
ROW_NUMBER(). - Optimized for BI reporting and analytics.
- Builds a Star Schema with:
bronze.crm_cust_infobronze.crm_prd_infobronze.crm_sales_detailsbronze.erp_cust_az12bronze.erp_loc_a101bronze.erp_px_cat_g1v2
silver.crm_cust_infosilver.crm_prd_infosilver.crm_sales_detailssilver.erp_cust_az12silver.erp_loc_a101silver.erp_px_cat_g1v2
-
Dimensions:
gold.dim_customers(customer_key, customer_id, first_name, last_name, country, gender, birthdate, marital_status)gold.dim_products(product_key, product_id, product_name, category, subcategory, maintenance, cost, product_line)
-
Fact:
gold.fact_sales(order_number, customer_key, product_key, order_date, sales_amount, quantity, price)
This warehouse supports various business intelligence use cases:
- Sales Reporting: Total sales by product, category, or customer demographic.
- Customer 360 View: Unified CRM & ERP view (location, age, gender).
- Product Insights: Sales trends by product line, category, or maintenance type.
- Data Quality Auditing: Regular validation with
tests/scripts.
- SQL Server (T-SQL)
- BULK INSERT for data ingestion
- Stored Procedures for ETL orchestration
- Window Functions (
ROW_NUMBER) for surrogate key generation - Pure SQL ETL (No external tools like SSIS or ADF)
- Silver Layer Checks: Ensures clean, valid, non-duplicated data.
- Gold Layer Checks: Validates star schema integrity and referential consistency.
- Ensures the pipeline is robust, scalable, and production-ready.
This project showcases:
✔️ End-to-end SQL-based ETL pipeline
✔️ Medallion architecture (Bronze, Silver, Gold)
✔️ Star schema design for analytics
✔️ Practical data engineering best practices
It provides a self-contained, reproducible demonstration of data warehousing in SQL Server—ideal for recruiters, data engineers, and analysts.