DEV Community

Cover image for Analyzing Retail Sales Data in SQL
Jonathan Powell
Jonathan Powell

Posted on

Analyzing Retail Sales Data in SQL

Intro

This year one of my goals was to improve my data analysis skills. As a software engineer, most of time is spent in application level code. Though I work with relational databases, I really don't work with complex queries beyond a few joins. But, data analysis is a common part of our job. If you've ever been on-call, you'v'e probably had to query your database to understand the state of your sysem, who is impacted, when did an issue start?

To improve my data analysis skills I got a subscription to DataCamp, a popular online course platform focused on data analytics, data engineering, and AI. I've completed a number of courses, and now I'm testing out my skills by doing what's called an Exploratory Data Analysis (EDA).

Exploratory Data Analysis

Table Of Contents

Overview

I grabbed this Superstore Sales Dataset from Kaggle as a CSV file.

It has 13 columns that include sales data for a generic superstore:

  • Ship Mode: Mode of shipping used for shipment delivery
  • Segment: (Categorical) Customer segment product was shipped to
  • Country: Country in which the shipment was delivered
  • City: City in which shipment was delivered
  • State: State in which the shipment was delivered
  • Postal Code: Postal code the shipment was delivered to
  • Region: Country region
  • Category: The category product belongs to
  • Sub-Category: Sub-category of the product
  • Sales: Sale made in USD
  • Quantity: Product quantity
  • Discount: Discount given on the product
  • Profit: Profit/loss made on the sale

Given this sales data, I asked a few questions to learn more.
I used SQL to parse the CSV and load it into a table I created and I asked a few questions about the data.

Findings

Here were some of my findings from querying the dataset.

Overall profit of the dataset was $286,397.02 dollars. California, New York, and Washington State contributed most to the overall profits.

When organized by region the West had the most profit, followed by the East, South, and finally Central regions.

Though capital cities tended to be the highest profit cities for states, some states had non-capital cities as their highest profit cities. For example, Lebanon, Tennessee was the highest profit city in the state.

Technology was the most profitable product category among the three categories (Tech, Furniture, Office Supplies) but the Office Supplies categoryhad the most sales (~6200 sales).

Though total technology sales were lower, Tech products had the highest profit per unit. Corporate Copiers had the highest profit per unit with ~ $271 dollars of profit per copier. By comparison, the highest profit per unit for Office Supplies was ~$14, for Appliances.

Across most states (48), products in the Home Office segment had the most sales. Binders and Paper sold the most units of Home Office items with 1,111 binders sold and 1,021 units of Paper sold.

Exploratory Queries

How many sales were made in the dataset?

SELECT COUNT(*) FROM sales; -- 9994 
Enter fullscreen mode Exit fullscreen mode

How many total items were sold?

SELECT SUM(quantity) FROM sales; -- 37873 
Enter fullscreen mode Exit fullscreen mode

What was the total profit overall? And, what were the top 10 highest profit states?

WITH aggregate_profits AS ( SELECT CASE WHEN country IS NOT NULL THEN country ELSE 'Overall' END as country, state, SUM(profit) as total_profit FROM sales GROUP BY GROUPING SETS((country, state), ()) ORDER BY country, state) SELECT country, state, total_profit FROM aggregate_profits ORDER BY total_profit DESC LIMIT 10; 
Enter fullscreen mode Exit fullscreen mode
 country | state | total_profit ---------------+------------+-------------- Overall | | 286397.0217 United States | California | 76381.3871 United States | New York | 74038.5486 United States | Washington | 33402.6517 United States | Michigan | 24463.1876 United States | Virginia | 18597.9504 United States | Indiana | 18382.9363 United States | Georgia | 16250.0433 United States | Kentucky | 11199.6966 United States | Minnesota | 10823.1874 
Enter fullscreen mode Exit fullscreen mode

Which city had the most profit in each state?

WITH ranked_profits_by_state_city AS ( select state, city, sum(profit) as profit, RANK() OVER (PARTITION BY state ORDER BY sum(profit) DESC) as rank FROM sales GROUP BY state, city ORDER BY state, rank) SELECT state, city, profit FROM ranked_profits_by_state_city WHERE rank = 1; 
Enter fullscreen mode Exit fullscreen mode
 state | city | profit ----------------------+-----------------+------------ Alabama | Mobile | 2175.8292 Arizona | Glendale | 182.8598 Arkansas | Fayetteville | 1691.9419 California | Los Angeles | 30440.7579 Colorado | Thornton | 140.8398 Connecticut | Fairfield | 1221.6226 Delaware | Newark | 8086.1715 District of Columbia | Washington | 1059.5893 
Enter fullscreen mode Exit fullscreen mode

What were total sales and profit by catagory, and subcategory?

SELECT category, sub_category, COUNT(*) as total_sales, SUM(profit) as profit FROM sales GROUP BY ROLLUP(category, sub_category) ORDER BY category ASC NULLS FIRST, sub_category ASC NULLS FIRST; 
Enter fullscreen mode Exit fullscreen mode
 category | sub_category | total_sales | profit -----------------+--------------+-------------+------------- | | 9994 | 286397.0217 Furniture | | 2121 | 18451.2728 Furniture | Bookcases | 228 | -3472.5560 Furniture | Chairs | 617 | 26590.1663 Furniture | Furnishings | 957 | 13059.1436 Furniture | Tables | 319 | -17725.4811 Office Supplies | | 6026 | 122490.8008 Office Supplies | Appliances | 466 | 18138.0054 Office Supplies | Art | 796 | 6527.7870 Office Supplies | Binders | 1523 | 30221.7633 Office Supplies | Envelopes | 254 | 6964.1767 Office Supplies | Fasteners | 217 | 949.5182 Office Supplies | Labels | 364 | 5546.2540 Office Supplies | Paper | 1370 | 34053.5693 Office Supplies | Storage | 846 | 21278.8264 Office Supplies | Supplies | 190 | -1189.0995 Technology | | 1847 | 145454.9481 Technology | Accessories | 775 | 41936.6357 Technology | Copiers | 68 | 55617.8249 Technology | Machines | 115 | 3384.7569 Technology | Phones | 889 | 44515.7306 
Enter fullscreen mode Exit fullscreen mode

What were sales and profit by region and category?

SELECT region, category, COUNT(*) as total_sales, SUM(profit) as total_profit, FROM sales GROUP BY ROLLUP(region, category) ORDER BY region NULLS FIRST, category NULLS FIRST; 
Enter fullscreen mode Exit fullscreen mode
 region | category | total_sales | total_profit ---------+-----------------+-------------+-------------- | | 9994 | 286397.0217 Central | | 2323 | 39706.3625 Central | Furniture | 481 | -2871.0494 Central | Office Supplies | 1422 | 8879.9799 Central | Technology | 420 | 33697.4320 East | | 2848 | 91522.7800 East | Furniture | 601 | 3046.1658 East | Office Supplies | 1712 | 41014.5791 East | Technology | 535 | 47462.0351 South | | 1620 | 46749.4303 South | Furniture | 332 | 6771.2061 South | Office Supplies | 995 | 19986.3928 South | Technology | 293 | 19991.8314 West | | 3203 | 108418.4489 West | Furniture | 707 | 11504.9503 West | Office Supplies | 1897 | 52609.8490 West | Technology | 599 | 44303.6496 
Enter fullscreen mode Exit fullscreen mode

What were sales, and profits by segment, cateory, and subcategory

 SELECT segment, category, sub_category, ROUND(SUM(profit), 2) AS total_profit, SUM(quantity) AS units_sold, ROUND(SUM(profit) / SUM(quantity) :: NUMERIC, 2) AS profit_per_unit FROM sales GROUP BY segment, category, sub_category ORDER BY profit_per_unit DESC; 
Enter fullscreen mode Exit fullscreen mode
 segment | category | sub_category | total_profit | units_sold | profit_per_unit -------------+-----------------+--------------+--------------+------------+----------------- Consumer | Technology | Copiers | 24083.71 | 117 | 205.84 Consumer | Technology | Phones | 23837.11 | 1685 | 14.15 Consumer | Technology | Accessories | 20735.92 | 1578 | 13.14 Consumer | Furniture | Chairs | 13235.33 | 1234 | 10.73 Consumer | Technology | Machines | 2141.06 | 217 | 9.87 Consumer | Office Supplies | Appliances | 6981.93 | 908 | 7.69 Consumer | Office Supplies | Envelopes | 3264.41 | 442 | 7.39 Consumer | Office Supplies | Binders | 17995.60 | 3015 | 5.97 Consumer | Office Supplies | Paper | 15534.64 | 2602 | 5.97 Consumer | Office Supplies | Storage | 7104.20 | 1619 | 4.39 Consumer | Furniture | Furnishings | 7919.42 | 1834 | 4.32 Consumer | Office Supplies | Labels | 3075.99 | 715 | 4.30 
Enter fullscreen mode Exit fullscreen mode

What are the most popular product segments by quantity sold?

SELECT state, segment, RANK() OVER (PARTITION BY state ORDER BY COUNT(quantity)) as popularity FROM sales GROUP BY state, segment; 
Enter fullscreen mode Exit fullscreen mode
 state | segment | popularity ----------------------+-------------+------------ Alabama | Home Office | 1 Alabama | Consumer | 2 Alabama | Corporate | 3 Arizona | Home Office | 1 Arizona | Corporate | 2 Arizona | Consumer | 3 
Enter fullscreen mode Exit fullscreen mode

Preparing the Data

  1. Using the psql cli, I create the database
psql -- Launch Postgres CLI CREATE DATABASE sales_db; \c sales_db -- Connect to the newly created database 
Enter fullscreen mode Exit fullscreen mode
  1. Create the database table for the data in the csv
CREATE TABLE sales ( id bigserial, ship_mode TEXT, segment TEXT, country TEXT, city TEXT, state TEXT, postal_code TEXT, region TEXT, category TEXT, sub_category TEXT, sales BIGINT, quantity BIGINT, discount FLOAT, profit NUMERIC); 
Enter fullscreen mode Exit fullscreen mode
  1. Load the csv file intot the table

Note: I removed the header row in the csv before copying it.

COPY sales(ship_mode, segment, country, city, state, postal_code, region, category, sub_category, sales, quantity, discount, profit) FROM '/path/to/SampleSuperstore.csv' (DELIMITER ',', FORMAT csv, HEADER false); 
Enter fullscreen mode Exit fullscreen mode

This command failed because the sales column in the csv file actually a numeric type, not a bigint. I changed the column to a numeric type to allow the data to be imported.

ALTER TABLE sales ALTER COLUMN sales TYPE NUMERIC; 
Enter fullscreen mode Exit fullscreen mode

With the correct types I could now run the COPY command and populate the
database table.

Top comments (0)