Overview
This project focuses on analyzing two key business processes using Amazon Redshift:
- Identifying the most common frequency of purchases per season.
- Finding products with high review ratings.
Step 1: Setting Up Amazon Redshift
-
Create a Redshift Cluster
- Navigate to AWS Redshift Console.
- Click Create Cluster.
- Choose a single-node cluster.
- Select an appropriate node type (e.g.,
dc2.large
). - Set the database name, username, and password.
- Click Create cluster.
-
Create an IAM Role
- Go to the IAM Console.
- Create a new role with AmazonS3ReadOnlyAccess.
- Attach the role to your Redshift cluster.
-
Create an S3 Bucket and Upload Data
- Navigate to Amazon S3.
- Create a new bucket (e.g.,
zzetu
). - Upload
shopping_data.csv
to this bucket.
Step 2: Loading Data into Redshift
- Use the Amazon Redshift Query Editor to run SQL commands.
- Create a table to store raw data:
CREATE TABLE shopping ( CustomerID INTEGER, Age INTEGER, Gender VARCHAR(50), Category VARCHAR(50), Location VARCHAR(50), Season VARCHAR(50), ReviewRating REAL, SubscriptionStatus VARCHAR(50), PaymentMethod VARCHAR(50), ShippingType VARCHAR(50), DiscountApplied VARCHAR(50), PromoCodeUsed VARCHAR(50), PreviousPurchases INTEGER, PreferredPaymentMethod VARCHAR(50), FrequencyOfPurchases VARCHAR(50) );
- Copy data from S3 into Redshift:
COPY shopping FROM 's3://zzetu/shopping_data.csv' IAM_ROLE 'arn:aws:iam::your-account-id:role/your-redshift-role' FORMAT AS CSV IGNOREHEADER 1;
Step 3: Schema Design
We are using a star schema with 1 fact table and 3 dimension tables:
1. Dimension Tables
CREATE TABLE dimCustomer ( CustomerID INTEGER PRIMARY KEY, Age INTEGER, Gender VARCHAR(50), Location VARCHAR(50), SubscriptionStatus VARCHAR(50), PreferredPaymentMethod VARCHAR(50) ); CREATE TABLE dimProduct ( Category VARCHAR(50) PRIMARY KEY, DiscountApplied VARCHAR(50), PromoCodeUsed VARCHAR(50) ); CREATE TABLE dimTransaction ( TransactionID INTEGER IDENTITY(1,1) PRIMARY KEY, CustomerID INTEGER, PaymentMethod VARCHAR(50), ShippingType VARCHAR(50), Season VARCHAR(50), FrequencyOfPurchases VARCHAR(50), FOREIGN KEY (CustomerID) REFERENCES dimCustomer(CustomerID) );
2. Fact Table
CREATE TABLE factPurchases ( CustomerID INTEGER, Age INTEGER, ReviewRating REAL, PreviousPurchases INTEGER, FOREIGN KEY (CustomerID) REFERENCES dimCustomer(CustomerID) );
Step 4: Data Insertion
-- Insert data into dimCustomer INSERT INTO dimCustomer (CustomerID, Age, Gender, Location, SubscriptionStatus, PreferredPaymentMethod) SELECT DISTINCT CustomerID, Age, Gender, Location, SubscriptionStatus, PreferredPaymentMethod FROM shopping; -- Insert data into dimProduct INSERT INTO dimProduct (Category, DiscountApplied, PromoCodeUsed) SELECT DISTINCT Category, DiscountApplied, PromoCodeUsed FROM shopping; -- Insert data into dimTransaction INSERT INTO dimTransaction (CustomerID, PaymentMethod, ShippingType, Season, FrequencyOfPurchases) SELECT CustomerID, PaymentMethod, ShippingType, Season, FrequencyOfPurchases FROM shopping; -- Insert data into factPurchases INSERT INTO factPurchases (CustomerID, Age, ReviewRating, PreviousPurchases) SELECT CustomerID, Age, ReviewRating, PreviousPurchases FROM shopping;
Step 5: Business Queries
1. Most Used Frequency of Purchase per Season
SELECT t.Season, t.FrequencyOfPurchases, COUNT(*) AS PurchaseCount FROM factPurchases f JOIN dimTransaction t ON f.CustomerID = t.CustomerID GROUP BY t.Season, t.FrequencyOfPurchases ORDER BY t.Season, PurchaseCount DESC;
2. Products with High Review Ratings
SELECT Category, AVG(f.ReviewRating) AS AvgRating FROM fact_Purchases f JOIN shopping s ON f.CustomerID = s.CustomerID GROUP BY Category ORDER BY AvgRating DESC;
Setup Guide
- Create Redshift Cluster and configure IAM roles.
- Load Data from S3 into Redshift tables.
- Run the SQL scripts for creating tables and inserting data.
- Execute business queries to generate insights.
Top comments (0)