Eric Katumo for LuxDevHQ

Posted on Mar 26

Amazon Redshift Data Warehousing (Sample Project)

#database #dataengineering #data

Overview

This project focuses on analyzing two key business processes using Amazon Redshift:

Identifying the most common frequency of purchases per season.
Finding products with high review ratings.

Step 1: Setting Up Amazon Redshift

Create a Redshift Cluster
- Navigate to AWS Redshift Console.
- Click Create Cluster.
- Choose a single-node cluster.
- Select an appropriate node type (e.g., dc2.large).
- Set the database name, username, and password.
- Click Create cluster.
Create an IAM Role
- Go to the IAM Console.
- Create a new role with AmazonS3ReadOnlyAccess.
- Attach the role to your Redshift cluster.
Create an S3 Bucket and Upload Data
- Navigate to Amazon S3.
- Create a new bucket (e.g., zzetu).
- Upload shopping_data.csv to this bucket.

Step 2: Loading Data into Redshift

Use the Amazon Redshift Query Editor to run SQL commands.
Create a table to store raw data:

CREATE TABLE shopping ( CustomerID INTEGER, Age INTEGER, Gender VARCHAR(50), Category VARCHAR(50), Location VARCHAR(50), Season VARCHAR(50), ReviewRating REAL, SubscriptionStatus VARCHAR(50), PaymentMethod VARCHAR(50), ShippingType VARCHAR(50), DiscountApplied VARCHAR(50), PromoCodeUsed VARCHAR(50), PreviousPurchases INTEGER, PreferredPaymentMethod VARCHAR(50), FrequencyOfPurchases VARCHAR(50) );

Copy data from S3 into Redshift:

COPY shopping FROM 's3://zzetu/shopping_data.csv' IAM_ROLE 'arn:aws:iam::your-account-id:role/your-redshift-role' FORMAT AS CSV IGNOREHEADER 1;

Step 3: Schema Design

We are using a star schema with 1 fact table and 3 dimension tables:

1. Dimension Tables

CREATE TABLE dimCustomer ( CustomerID INTEGER PRIMARY KEY, Age INTEGER, Gender VARCHAR(50), Location VARCHAR(50), SubscriptionStatus VARCHAR(50), PreferredPaymentMethod VARCHAR(50) ); CREATE TABLE dimProduct ( Category VARCHAR(50) PRIMARY KEY, DiscountApplied VARCHAR(50), PromoCodeUsed VARCHAR(50) ); CREATE TABLE dimTransaction ( TransactionID INTEGER IDENTITY(1,1) PRIMARY KEY, CustomerID INTEGER, PaymentMethod VARCHAR(50), ShippingType VARCHAR(50), Season VARCHAR(50), FrequencyOfPurchases VARCHAR(50), FOREIGN KEY (CustomerID) REFERENCES dimCustomer(CustomerID) );

2. Fact Table

CREATE TABLE factPurchases ( CustomerID INTEGER, Age INTEGER, ReviewRating REAL, PreviousPurchases INTEGER, FOREIGN KEY (CustomerID) REFERENCES dimCustomer(CustomerID) );

Step 4: Data Insertion

-- Insert data into dimCustomer INSERT INTO dimCustomer (CustomerID, Age, Gender, Location, SubscriptionStatus, PreferredPaymentMethod) SELECT DISTINCT CustomerID, Age, Gender, Location, SubscriptionStatus, PreferredPaymentMethod FROM shopping; -- Insert data into dimProduct INSERT INTO dimProduct (Category, DiscountApplied, PromoCodeUsed) SELECT DISTINCT Category, DiscountApplied, PromoCodeUsed FROM shopping; -- Insert data into dimTransaction INSERT INTO dimTransaction (CustomerID, PaymentMethod, ShippingType, Season, FrequencyOfPurchases) SELECT CustomerID, PaymentMethod, ShippingType, Season, FrequencyOfPurchases FROM shopping; -- Insert data into factPurchases INSERT INTO factPurchases (CustomerID, Age, ReviewRating, PreviousPurchases) SELECT CustomerID, Age, ReviewRating, PreviousPurchases FROM shopping;

Step 5: Business Queries

1. Most Used Frequency of Purchase per Season

SELECT t.Season, t.FrequencyOfPurchases, COUNT(*) AS PurchaseCount FROM factPurchases f JOIN dimTransaction t ON f.CustomerID = t.CustomerID GROUP BY t.Season, t.FrequencyOfPurchases ORDER BY t.Season, PurchaseCount DESC;

2. Products with High Review Ratings

SELECT Category, AVG(f.ReviewRating) AS AvgRating FROM fact_Purchases f JOIN shopping s ON f.CustomerID = s.CustomerID GROUP BY Category ORDER BY AvgRating DESC;

DEV Community

Amazon Redshift Data Warehousing (Sample Project)

Overview

Step 1: Setting Up Amazon Redshift

Step 2: Loading Data into Redshift

Step 3: Schema Design

1. Dimension Tables

2. Fact Table

Step 4: Data Insertion

Step 5: Business Queries

1. Most Used Frequency of Purchase per Season

2. Products with High Review Ratings

Setup Guide

Repository Link

Top comments (0)