Skip to content

Commit ac16bca

Browse files
authored
Added PartitionBy-OrderBy
1 parent e5c9dc3 commit ac16bca

File tree

1 file changed

+230
-0
lines changed

1 file changed

+230
-0
lines changed

07-PB-OB/07-Queries.sql

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
'
2+
In Part 3, we learned what PARTITION BY is. It allows us to compute certain functions independently for groups of rows and still maintain their individual character. In Part 3, we only used PARTITION BY with the aggregate functions which we had known before: AVG(), COUNT(), MAX(), MIN(), SUM(). None of these functions required the use of ORDER BY: the order of rows simply doesnt matter in this case.
3+
4+
However, in part 4,5 and 6, we got to know new elements where the order does matter: ranking functions, window frames and analytical functions.
5+
6+
In this part, we will learn how to use PARTITION BY with these new elements. Each time, we will also need an ORDER BY clause – hence the name of the part: PARTITION BY ORDER BY. Remember to keep the order: PARTITION BY comes before ORDER BY, or it simply wont make any sense
7+
'
8+
9+
-- Take into account the period between August 10 and August 14, 2016. For each row of sales, show the following information: store_id, day, number of customers and the rank based on the number of customers in the particular store
10+
select store_id, day, customers,
11+
rank() over(partition by store_id order by customers)
12+
from sales
13+
where day between '2016-08-10' and '2016-08-14'
14+
>>>
15+
store_id | day | customers | rank |
16+
----------+------------+-----------+------+
17+
1 | 2016-08-10 | 524 | 1 |
18+
1 | 2016-08-13 | 669 | 2 |
19+
1 | 2016-08-14 | 721 | 3 |
20+
1 | 2016-08-12 | 1024 | 4 |
21+
1 | 2016-08-11 | 1416 | 5 |
22+
2 | 2016-08-13 | 1586 | 1 |
23+
2 | 2016-08-11 | 1880 | 2 |
24+
2 | 2016-08-12 | 1900 | 3 |
25+
2 | 2016-08-14 | 1984 | 4 |
26+
27+
28+
--Take the sales between August 1 and August 10, 2016. For each row, show the store_id, the day, the revenue on that day and quartile number (quartile means we divide the rows into four groups) based on the revenue of the given store in the descending order.
29+
SELECT store_id, day, revenue,
30+
NTILE(4) over(PARTITION BY store_id ORDER BY revenue DESC)
31+
FROM sales
32+
WHERE day between '2016-08-01' and '2016-08-10'
33+
>>>
34+
store_id | day | revenue | ntile |
35+
----------+------------+----------+-------+
36+
1 | 2016-08-06 | 6909.54 | 1 |
37+
1 | 2016-08-01 | 6708.16 | 1 |
38+
1 | 2016-08-04 | 6604.80 | 1 |
39+
1 | 2016-08-05 | 6409.46 | 2 |
40+
1 | 2016-08-07 | 5596.67 | 2 |
41+
1 | 2016-08-08 | 4254.43 | 2 |
42+
43+
44+
--The CTE in the parentheses in the below query creates a separate ranking of stores in each country based on their rating. In the outer query, we simply return the rows with the right rank. As a result, we'll see the best store in each country
45+
WITH ranking AS (
46+
SELECT country, city,
47+
RANK() OVER(PARTITION BY country ORDER BY rating DESC) AS rank
48+
FROM store
49+
)
50+
SELECT country, city FROM ranking WHERE rank = 1;
51+
>>>
52+
country | city |
53+
---------+-----------+
54+
France | Paris |
55+
Germany | Frankfurt |
56+
Spain | Madrid |
57+
58+
59+
--For each store, show a row with three columns: store_id, the revenue on the best day in that store in terms of the revenue and the day when that best revenue was achieved.
60+
WITH ranking AS(
61+
SELECT store_id, revenue, day,
62+
RANK() over(PARTITION BY store_id ORDER BY revenue DESC)
63+
FROM sales
64+
)
65+
SELECT store_id, revenue, day from ranking where rank=1;
66+
>>>
67+
store_id | revenue | day |
68+
----------+----------+------------+
69+
1 | 6909.54 | 2016-08-06 |
70+
2 | 24547.27 | 2016-08-08 |
71+
3 | 15845.45 | 2016-08-02 |
72+
4 | 19693.13 | 2016-08-09 |
73+
5 | 15665.50 | 2016-08-05 |
74+
6 | 10493.54 | 2016-08-14 |
75+
76+
77+
-- Let's analyze sales data between August 1 and August 3, 2016. For each row, show store_id, day, transactions and the ranking of the store on that day in terms of the number of transactions as compared to other stores. The store with the greatest number should get rank = 1. Use individual row ranks even when two rows share the same value.
78+
SELECT store_id, day, transactions,
79+
ROW_NUMBER() over(PARTITION BY day ORDER BY transactions DESC)
80+
FROM sales
81+
WHERE day between '2016-08-01' and '2016-08-03'
82+
>>>
83+
store_id | day | transactions | row_number |
84+
----------+------------+--------------+------------+
85+
10 | 2016-08-01 | 195 | 1 |
86+
7 | 2016-08-01 | 146 | 2 |
87+
9 | 2016-08-01 | 136 | 3 |
88+
8 | 2016-08-01 | 127 | 4 |
89+
4 | 2016-08-01 | 123 | 5 |
90+
91+
92+
--For each day of the sales statistics, show the day, the store_id of the best store in terms of the revenue on that day, and that revenue.
93+
WITH ranking as(
94+
SELECT day, store_id, revenue,
95+
RANK() over(PARTITION BY day ORDER BY revenue DESC)
96+
FROM sales
97+
)
98+
SELECT day, store_id, revenue FROM ranking WHERE rank=1
99+
>>>
100+
day | store_id | revenue |
101+
------------+----------+----------+
102+
2016-08-01 | 10 | 16536.36 |
103+
2016-08-02 | 2 | 17056.00 |
104+
2016-08-03 | 4 | 19661.13 |
105+
2016-08-04 | 2 | 12473.08 |
106+
2016-08-05 | 5 | 15665.50 |
107+
2016-08-06 | 4 | 13722.67 |
108+
109+
110+
--Divide the sales results for each store into four groups based on the number of transactions and for each store, show the rows in the group with the lowest numbers of transactions: store_id, day, transactions.
111+
WITH ranking AS(
112+
SELECT store_id, day, transactions,
113+
NTILE(4) over(PARTITION BY store_id ORDER BY transactions) as rank
114+
FROM sales
115+
)
116+
SELECT day, store_id, transactions FROM ranking WHERE rank=1
117+
>>>
118+
day | store_id | transactions |
119+
------------+----------+--------------+
120+
2016-08-14 | 1 | 30 |
121+
2016-08-09 | 1 | 30 |
122+
2016-08-03 | 1 | 30 |
123+
2016-08-13 | 1 | 33 |
124+
2016-08-01 | 2 | 71 |
125+
2016-08-12 | 2 | 76 |
126+
2016-08-10 | 2 | 85 |
127+
128+
129+
'
130+
Now, lets see how we can use window frames along with PARTITION BY...ORDER BY...
131+
'
132+
-- Show sales statistics between August 1 and August 7, 2016. For each row, show store_id, day, revenue and the best revenue in the respective store up to that date.
133+
SELECT store_id, day, revenue,
134+
MAX(revenue) OVER(PARTITION BY store_id ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
135+
FROM sales
136+
WHERE day BETWEEN '2016-08-01' and '2016-08-07'
137+
>>>
138+
store_id | day | revenue | max |
139+
----------+------------+----------+----------+
140+
1 | 2016-08-01 | 6708.16 | 6708.16 |
141+
1 | 2016-08-02 | 3556.00 | 6708.16 |
142+
1 | 2016-08-03 | 2806.82 | 6708.16 |
143+
1 | 2016-08-04 | 6604.80 | 6708.16 |
144+
1 | 2016-08-05 | 6409.46 | 6708.16 |
145+
1 | 2016-08-06 | 6909.54 | 6909.54 |
146+
147+
148+
'
149+
Now, lets talk about the use of analytical functions with PARTITION BY ORDER BY. In the below example, we show the country, city and opening_day of each store, but we also show the city where the next store was opened – in the same country, of course
150+
'
151+
SELECT country, city, opening_day,
152+
LEAD(city,1,'NaN') OVER(PARTITION BY country ORDER BY opening_day)
153+
FROM store;
154+
>>>
155+
country | city | opening_day | lead |
156+
---------+-----------+-------------+-----------+
157+
France | Nice | 2014-03-15 | Lyon |
158+
France | Lyon | 2014-09-24 | Paris |
159+
France | Paris | 2014-12-05 | Bordeaux |
160+
France | Bordeaux | 2015-07-29 | NaN |
161+
Germany | Berlin | 2014-12-15 | Frankfurt |
162+
Germany | Frankfurt | 2015-03-14 | Hamburg |
163+
164+
165+
-- For each store, show the sales in the period between August 5, 2016 and August 10, 2016: store_id, day, number of transactions, number of transactions on the previous day and the difference between these two values.
166+
SELECT store_id, day, transactions,
167+
LAG(transactions) OVER(PARTITION BY store_id ORDER BY day),
168+
transactions - LAG(transactions) OVER(PARTITION BY store_id ORDER BY day)
169+
FROM sales
170+
WHERE day BETWEEN '2016-08-05' and '2016-08-10'
171+
>>>
172+
store_id | day | transactions | lag | ?column? |
173+
----------+------------+--------------+-----+----------+
174+
1 | 2016-08-05 | 66 |null | null |
175+
1 | 2016-08-06 | 123 | 66 | 57 |
176+
1 | 2016-08-07 | 61 | 123 | -62 |
177+
1 | 2016-08-08 | 63 | 61 | 2 |
178+
1 | 2016-08-09 | 30 | 63 | -33 |
179+
1 | 2016-08-10 | 48 | 30 | 18 |
180+
2 | 2016-08-05 | 147 |null | null |
181+
2 | 2016-08-06 | 137 | 147 | -10 |
182+
2 | 2016-08-07 | 93 | 137 | -44 |
183+
2 | 2016-08-08 | 267 | 93 | 174 |
184+
185+
186+
-- Show sales figures in the period between August 1 and August 3: for each store, show the store_id, the day, the revenue and the date with the best revenue in that period as best_revenue_day.
187+
SELECT store_id, day, revenue,
188+
FIRST_VALUE(day) OVER(PARTITION BY store_id ORDER BY revenue DESC) as best_revenue_day
189+
FROM sales
190+
WHERE day BETWEEN '2016-08-01' and '2016-08-03'
191+
>>>
192+
store_id | day | revenue | best_revenue_day |
193+
----------+------------+----------+------------------+
194+
1 | 2016-08-01 | 6708.16 | 2016-08-01 |
195+
1 | 2016-08-02 | 3556.00 | 2016-08-01 |
196+
1 | 2016-08-03 | 2806.82 | 2016-08-01 |
197+
2 | 2016-08-02 | 17056.00 | 2016-08-02 |
198+
2 | 2016-08-03 | 7209.78 | 2016-08-02 |
199+
2 | 2016-08-01 | 4828.00 | 2016-08-02 |
200+
3 | 2016-08-02 | 15845.45 | 2016-08-02 |
201+
202+
203+
--For each row, show the following columns: store_id, day, customers and the number of clients in the 5th greatest store in terms of the number of customers on that day.
204+
SELECT store_id, day, customers,
205+
NTH_VALUE(customers, 5) OVER(PARTITION BY day ORDER BY customers DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
206+
FROM sales
207+
>>>
208+
store_id | day | customers | nth_value |
209+
----------+------------+-----------+-----------+
210+
4 | 2016-08-01 | 2218 | 1896 |
211+
10 | 2016-08-01 | 2140 | 1896 |
212+
8 | 2016-08-01 | 1912 | 1896 |
213+
9 | 2016-08-01 | 1897 | 1896 |
214+
7 | 2016-08-01 | 1896 | 1896 |
215+
2 | 2016-08-01 | 1704 | 1896 |
216+
1 | 2016-08-01 | 1465 | 1896 |
217+
3 | 2016-08-01 | 1379 | 1896 |
218+
5 | 2016-08-01 | 773 | 1896 |
219+
6 | 2016-08-01 | 348 | 1896 |
220+
221+
222+
223+
224+
225+
226+
227+
228+
229+
230+

0 commit comments

Comments
 (0)