Suppose we want to compare each data record with its neighbors based on some column value. For example:
- Compare sales from one month to the previous month (month-over-month or MoM change) or to the same month a year ago (year-over-year or YoY change).
- Compare financial results for a given period to the same period in the previous year (like-for-like or LFL analysis).
- Observe the daily difference in stock prices to understand market trends.
- Calculate the difference in traffic between days of the week to plan capacity changes.
The solution is to use the lag()
function over an SQL window ordered by target columns.
Example
Let's compare the company's expenses
for each month to the previous month in absolute terms:
select year, month, expense, expense - lag(expense) over w as diff from expenses window w as (order by year, month) order by year, month;
┌──────┬───────┬─────────┬──────┐ │ year │ month │ expense │ diff │ ├──────┼───────┼─────────┼──────┤ │ 2020 │ 1 │ 82 │ │ │ 2020 │ 2 │ 75 │ -7 │ │ 2020 │ 3 │ 104 │ 29 │ │ 2020 │ 4 │ 94 │ -10 │ │ 2020 │ 5 │ 99 │ 5 │ │ 2020 │ 6 │ 105 │ 6 │ │ 2020 │ 7 │ 95 │ -10 │ │ 2020 │ 8 │ 110 │ 15 │ │ 2020 │ 9 │ 104 │ -6 │ │ 2020 │ 10 │ 100 │ -4 │ │ 2020 │ 11 │ 98 │ -2 │ │ 2020 │ 12 │ 106 │ 8 │ └──────┴───────┴─────────┴──────┘
The lag(value, offset)
function returns the value
of the record that is offset
rows behind the current one. By default, the offset is 1 and can be omitted.
Now let's calculate the relative change from month to month:
select year, month, expense, round( (expense - lag(expense) over w)*100.0 / lag(expense) over w ) as "diff %" from expenses window w as (order by year, month) order by year, month;
┌──────┬───────┬─────────┬────────┐ │ year │ month │ expense │ diff % │ ├──────┼───────┼─────────┼────────┤ │ 2020 │ 1 │ 82 │ │ │ 2020 │ 2 │ 75 │ -9 │ │ 2020 │ 3 │ 104 │ 39 │ │ 2020 │ 4 │ 94 │ -10 │ │ 2020 │ 5 │ 99 │ 5 │ │ 2020 │ 6 │ 105 │ 6 │ │ 2020 │ 7 │ 95 │ -10 │ │ 2020 │ 8 │ 110 │ 16 │ │ 2020 │ 9 │ 104 │ -5 │ │ 2020 │ 10 │ 100 │ -4 │ │ 2020 │ 11 │ 98 │ -2 │ │ 2020 │ 12 │ 106 │ 8 │ └──────┴───────┴─────────┴────────┘
Alternatives
Suppose we want to compare quarterly sales
with the previous year. This is where the offset
parameter comes in handy:
with data as ( select year, quarter, lag(amount, 4) over w as prev, amount as current, round(amount*100.0 / lag(amount, 4) over w) as "increase %" from sales window w as (order by year, quarter) ) select quarter, prev as y2019, current as y2020, "increase %" from data where year = 2020 order by quarter;
┌─────────┬────────┬────────┬────────────┐ │ quarter │ y2019 │ y2020 │ increase % │ ├─────────┼────────┼────────┼────────────┤ │ 1 │ 155040 │ 242040 │ 156 │ │ 2 │ 162600 │ 338040 │ 208 │ │ 3 │ 204120 │ 287520 │ 141 │ │ 4 │ 200700 │ 377340 │ 188 │ └─────────┴────────┴────────┴────────────┘
Looking back 4 quarters with lag(amount, 4)
gives us the same quarter but from the previous year.
There is also a lead()
function. It works just like lag()
, except that it looks forward instead of backward.
Compatibility
All major vendors support the lag()
and lead()
window functions. Some of them, such as MS SQL and Oracle, do not support the window
clause. In these cases, we can inline the window definition:
select year, month, expense, expense - lag(expense) over ( order by year, month ) as diff from expenses order by year, month;
We can also rewrite the query without window functions:
select cur.year, cur.month, cur.expense, cur.expense - prev.expense as diff from expenses cur left join expenses prev on cur.year = prev.year and cur.month - 1 = prev.month order by cur.year, cur.month;
Want to learn more about window functions? Read my book — SQL Window Functions Explained
Follow @ohmypy on Twitter to keep up with new posts
Top comments (0)