The Magic of Window Functions in Postgres

The Magic of Window Functions in Postgres BRUCE MOMJIAN This presentation explains the many window function facilities and how they can be used to produce useful SQL query results. https://momjian.us/presentations Creative Commons Attribution License Last updated: April, 2020 1 / 91

PostgreSQL the database… • Open Source Object Relational DBMS since 1996 • Distributed under the PostgreSQL License • Similar technical heritage as Oracle, SQL Server & DB2 • However, a strong adherence to standards (ANSI-SQL 2008) • Highly extensible and adaptable design • Languages, indexing, data types, etc. • E.g. PostGIS, JSONB, SQL/MED • Extensive use throughout the world for applications and organizations of all types • Bundled into Red Hat Enterprise Linux, Ubuntu, CentOS, and most cloud providers 2 / 91

PostgreSQL the community… • Independent community led by a Core Team of ﬁve • Large, active and vibrant community • www.postgresql.org • Downloads, Mailing lists, Documentation • Sponsors sampler: • Red Hat, VMWare, NTT, Fujitsu, Amazon, and EnterpriseDB • http://www.postgresql.org/community/ 3 / 91

EnterpriseDB the company… • Leading worldwide provider of Postgres software and services • More than 4,000 enterprises, governments, and other organizations worldwide • EDB Postgres Platform with: • PostgreSQL and EDB Postgres Advanced Server including additional enterprise functionaility • Tool Suites for Management, Integration, and Migration, including High Availability and Disaster Recovery • Professional Services, 24/7 global support, and Remote DBA • Training and Certiﬁcation • Citizenship • Contributor of key features: Materialized Views, JSON, & more • Sixteen community members on staff 4 / 91

EnterpriseDB the company… 5 / 91

EDB Recognized 5 Years In a Row in Gartner’s Magic Quadrant 6 / 91

Outline 1. Introduction to window functions 2. Window function syntax 3. Window syntax with generic aggregates 4. Window-speciﬁc functions 5. Window function examples 6. Considerations 7 / 91

1. Introduction to Window Functions https://www.ﬂickr.com/photos/conalg/ 8 / 91

Postgres Data Analytics Features • Aggregates • Optimizer • Server-side languages, e.g., PL/R • Window functions • Bitmap heap scans • Tablespaces • Data partitioning • Materialized views • Common table expressions (CTE) • BRIN indexes • GROUPING SETS, ROLLUP, CUBE • Just-in-time compilation (JIT) • Parallelism • Sharding (in progress) 9 / 91

What Is a Window Function? A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result. https://www.postgresql.org/docs/current/tutorial-window. html 10 / 91

Keep Your Eye on the Red (Text) https://www.ﬂickr.com/photos/alltheaces/ 11 / 91

Count to Ten SELECT * FROM generate_series(1, 10) AS f(x); x ---- 1 2 3 4 5 6 7 8 9 10 All the queries used in this presentation are available at https:// momjian.us/main/writings/pgsql/window.sql. 12 / 91

Simplest Window Function SELECT x, SUM(x) OVER () FROM generate_series(1, 10) AS f(x); x | sum ----+----- 1 | 55 2 | 55 3 | 55 4 | 55 5 | 55 6 | 55 7 | 55 8 | 55 9 | 55 10 | 55 13 / 91

Two OVER Clauses SELECT x, COUNT(x) OVER (), SUM(x) OVER () FROM generate_series(1, 10) AS f(x); x | count | sum ----+-------+----- 1 | 10 | 55 2 | 10 | 55 3 | 10 | 55 4 | 10 | 55 5 | 10 | 55 6 | 10 | 55 7 | 10 | 55 8 | 10 | 55 9 | 10 | 55 10 | 10 | 55 14 / 91

WINDOW Clause SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (); x | count | sum ----+-------+----- 1 | 10 | 55 2 | 10 | 55 3 | 10 | 55 4 | 10 | 55 5 | 10 | 55 6 | 10 | 55 7 | 10 | 55 8 | 10 | 55 9 | 10 | 55 10 | 10 | 55 15 / 91

Let’s See the Defaults SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | count | sum ----+-------+----- 1 | 10 | 55 2 | 10 | 55 3 | 10 | 55 4 | 10 | 55 5 | 10 | 55 6 | 10 | 55 7 | 10 | 55 8 | 10 | 55 9 | 10 | 55 10 | 10 | 55 16 / 91

2. Window Function Syntax https://www.ﬂickr.com/photos/bgreenlee/ 17 / 91

Window Syntax WINDOW ( [PARTITION BY …] [ORDER BY …] [ { RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end } ] ) where frame_start and frame_end can be: • UNBOUNDED PRECEDING • value PRECEDING • CURRENT ROW • value FOLLOWING • UNBOUNDED FOLLOWING Bracketed clauses are optional, braces are selected. https://www.postgresql.org/docs/current/sql-expressions. html#SYNTAX-WINDOW-FUNCTIONS 18 / 91

What Are the Defaults? (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) • No PARTITION BY (the set is a single partition) • No ORDER BY (all rows are peers of CURRENT ROW) • RANGE, not ROWS (CURRENT ROW includes all peers) Since PARTITION BY and ORDER BY are not defaults but RANGE is the default, CURRENT ROW defaults to representing all rows. 19 / 91

CURRENT ROW CURRENT ROW can mean the: • Literal current row • First or last row with the same ORDER BY value (ﬁrst/last peer) • First or last row of the partition 20 / 91

CURRENT ROW CURRENT ROW can mean the: • Literal current row (ROWS mode) • First or last row with the same ORDER BY value (ﬁrst/last peer) (RANGE mode with ORDER BY) • First or last row of the partition (RANGE mode without ORDER BY) 21 / 91

Visual Window Terms x −− 1 1 2 2 3 5 4 3 4 5 partition (which is the entire set here) window frame in ROWS UNBOUNDED PRECEDING window frame with ORDER BY x and defaults literal current row (CURRENT ROW in ROWS mode) peers defined by ORDER BY x (CURRENT ROW in RANGE mode) 22 / 91

SQL for Window Frames x −− 1 1 2 2 3 5 4 3 4 5 ROWS BETWEEN UNBOUNDED PRECEDING ROWS UNBOUNDED PRECEDING ORDER BY x UNBOUNDED PRECEDING ROWS CURRENT ROW AND CURRENT ROW ORDER BY x RANGE CURRENT ROW AND UNBOUNDED FOLLOWING (end frame default) 23 / 91

3. Window Syntax with Generic Aggregates https://www.ﬂickr.com/photos/azparrot/ 24 / 91

Back to the Last Query SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | count | sum ----+-------+----- 1 | 10 | 55 2 | 10 | 55 3 | 10 | 55 4 | 10 | 55 5 | 10 | 55 6 | 10 | 55 7 | 10 | 55 8 | 10 | 55 9 | 10 | 55 10 | 10 | 55 25 / 91

ROWS Instead of RANGE SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | count | sum ----+-------+----- 1 | 1 | 1 2 | 2 | 3 3 | 3 | 6 4 | 4 | 10 5 | 5 | 15 6 | 6 | 21 7 | 7 | 28 8 | 8 | 36 9 | 9 | 45 10 | 10 | 55 26 / 91

Default End Frame (CURRENT ROW) SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ROWS UNBOUNDED PRECEDING); x | count | sum ----+-------+----- 1 | 1 | 1 2 | 2 | 3 3 | 3 | 6 4 | 4 | 10 5 | 5 | 15 6 | 6 | 21 7 | 7 | 28 8 | 8 | 36 9 | 9 | 45 10 | 10 | 55 27 / 91

Only CURRENT ROW SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ROWS BETWEEN CURRENT ROW AND CURRENT ROW); x | count | sum ----+-------+----- 1 | 1 | 1 2 | 1 | 2 3 | 1 | 3 4 | 1 | 4 5 | 1 | 5 6 | 1 | 6 7 | 1 | 7 8 | 1 | 8 9 | 1 | 9 10 | 1 | 10 28 / 91

Use Defaults SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ROWS CURRENT ROW); x | count | sum ----+-------+----- 1 | 1 | 1 2 | 1 | 2 3 | 1 | 3 4 | 1 | 4 5 | 1 | 5 6 | 1 | 6 7 | 1 | 7 8 | 1 | 8 9 | 1 | 9 10 | 1 | 10 29 / 91

UNBOUNDED FOLLOWING SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING); x | count | sum ----+-------+----- 1 | 10 | 55 2 | 9 | 54 3 | 8 | 52 4 | 7 | 49 5 | 6 | 45 6 | 5 | 40 7 | 4 | 34 8 | 3 | 27 9 | 2 | 19 10 | 1 | 10 30 / 91

PRECEDING SELECT x, COUNT(*) OVER w, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW); x | count | count | sum ----+-------+-------+----- 1 | 1 | 1 | 1 2 | 2 | 2 | 3 3 | 2 | 2 | 5 4 | 2 | 2 | 7 5 | 2 | 2 | 9 6 | 2 | 2 | 11 7 | 2 | 2 | 13 8 | 2 | 2 | 15 9 | 2 | 2 | 17 10 | 2 | 2 | 19 PRECEDING ignores nonexistent rows; they are not NULLs. 31 / 91

Use FOLLOWING SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING); x | count | sum ----+-------+----- 1 | 2 | 3 2 | 2 | 5 3 | 2 | 7 4 | 2 | 9 5 | 2 | 11 6 | 2 | 13 7 | 2 | 15 8 | 2 | 17 9 | 2 | 19 10 | 1 | 10 32 / 91

3 PRECEDING SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ROWS BETWEEN 3 PRECEDING AND CURRENT ROW); x | count | sum ----+-------+----- 1 | 1 | 1 2 | 2 | 3 3 | 3 | 6 4 | 4 | 10 5 | 4 | 14 6 | 4 | 18 7 | 4 | 22 8 | 4 | 26 9 | 4 | 30 10 | 4 | 34 33 / 91

ORDER BY SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ORDER BY x); x | count | sum ----+-------+----- 1 | 1 | 1 2 | 2 | 3 3 | 3 | 6 4 | 4 | 10 5 | 5 | 15 6 | 6 | 21 7 | 7 | 28 8 | 8 | 36 9 | 9 | 45 10 | 10 | 55 CURRENT ROW peers are rows with equal values for ORDER BY columns, or all partition rows if ORDER BY is not speciﬁed. 34 / 91

Default Frame Speciﬁed SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | count | sum ----+-------+----- 1 | 1 | 1 2 | 2 | 3 3 | 3 | 6 4 | 4 | 10 5 | 5 | 15 6 | 6 | 21 7 | 7 | 28 8 | 8 | 36 9 | 9 | 45 10 | 10 | 55 35 / 91

Only CURRENT ROW SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_series(1, 10) AS f(x) WINDOW w AS (ORDER BY x RANGE CURRENT ROW); x | count | sum ----+-------+----- 1 | 1 | 1 2 | 1 | 2 3 | 1 | 3 4 | 1 | 4 5 | 1 | 5 6 | 1 | 6 7 | 1 | 7 8 | 1 | 8 9 | 1 | 9 10 | 1 | 10 36 / 91

Create Table with Duplicates CREATE TABLE generate_1_to_5_x2 AS SELECT ceil(x/2.0) AS x FROM generate_series(1, 10) AS f(x); SELECT * FROM generate_1_to_5_x2; x --- 1 1 2 2 3 3 4 4 5 5 37 / 91

Empty Window Speciﬁcation SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (); x | count | sum ---+-------+----- 1 | 10 | 30 1 | 10 | 30 2 | 10 | 30 2 | 10 | 30 3 | 10 | 30 3 | 10 | 30 4 | 10 | 30 4 | 10 | 30 5 | 10 | 30 5 | 10 | 30 38 / 91

RANGE With Duplicates SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x); x | count | sum ---+-------+----- 1 | 2 | 2 1 | 2 | 2 2 | 4 | 6 2 | 4 | 6 3 | 6 | 12 3 | 6 | 12 4 | 8 | 20 4 | 8 | 20 5 | 10 | 30 5 | 10 | 30 39 / 91

Show Defaults SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | count | sum ---+-------+----- 1 | 2 | 2 1 | 2 | 2 2 | 4 | 6 2 | 4 | 6 3 | 6 | 12 3 | 6 | 12 4 | 8 | 20 4 | 8 | 20 5 | 10 | 30 5 | 10 | 30 40 / 91

ROWS SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | count | sum ---+-------+----- 1 | 1 | 1 1 | 2 | 2 2 | 3 | 4 2 | 4 | 6 3 | 5 | 9 3 | 6 | 12 4 | 7 | 16 4 | 8 | 20 5 | 9 | 25 5 | 10 | 30 41 / 91

RANGE on CURRENT ROW SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x RANGE CURRENT ROW); x | count | sum ---+-------+----- 1 | 2 | 2 1 | 2 | 2 2 | 2 | 4 2 | 2 | 4 3 | 2 | 6 3 | 2 | 6 4 | 2 | 8 4 | 2 | 8 5 | 2 | 10 5 | 2 | 10 42 / 91

ROWS on CURRENT ROW SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x ROWS CURRENT ROW); x | count | sum ---+-------+----- 1 | 1 | 1 1 | 1 | 1 2 | 1 | 2 2 | 1 | 2 3 | 1 | 3 3 | 1 | 3 4 | 1 | 4 4 | 1 | 4 5 | 1 | 5 5 | 1 | 5 43 / 91

PARTITION BY SELECT x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (PARTITION BY x); x | count | sum ---+-------+----- 1 | 2 | 2 1 | 2 | 2 2 | 2 | 4 2 | 2 | 4 3 | 2 | 6 3 | 2 | 6 4 | 2 | 8 4 | 2 | 8 5 | 2 | 10 5 | 2 | 10 Same as RANGE CURRENT ROW because the partition matches the window frame. 44 / 91

Create Two Partitions SELECT int4(x >= 3), x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (PARTITION BY x >= 3); int4 | x | count | sum ------+---+-------+----- 0 | 1 | 4 | 6 0 | 1 | 4 | 6 0 | 2 | 4 | 6 0 | 2 | 4 | 6 1 | 3 | 6 | 24 1 | 3 | 6 | 24 1 | 4 | 6 | 24 1 | 4 | 6 | 24 1 | 5 | 6 | 24 1 | 5 | 6 | 24 45 / 91

ORDER BY SELECT int4(x >= 3), x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (PARTITION BY x >= 3 ORDER BY x); int4 | x | count | sum ------+---+-------+----- 0 | 1 | 2 | 2 0 | 1 | 2 | 2 0 | 2 | 4 | 6 0 | 2 | 4 | 6 1 | 3 | 2 | 6 1 | 3 | 2 | 6 1 | 4 | 4 | 14 1 | 4 | 4 | 14 1 | 5 | 6 | 24 1 | 5 | 6 | 24 46 / 91

Show Defaults SELECT int4(x >= 3), x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (PARTITION BY x >= 3 ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); int4 | x | count | sum ------+---+-------+----- 0 | 1 | 2 | 2 0 | 1 | 2 | 2 0 | 2 | 4 | 6 0 | 2 | 4 | 6 1 | 3 | 2 | 6 1 | 3 | 2 | 6 1 | 4 | 4 | 14 1 | 4 | 4 | 14 1 | 5 | 6 | 24 1 | 5 | 6 | 24 47 / 91

ROWS SELECT int4(x >= 3), x, COUNT(x) OVER w, SUM(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (PARTITION BY x >= 3 ORDER BY x ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); int4 | x | count | sum ------+---+-------+----- 0 | 1 | 1 | 1 0 | 1 | 2 | 2 0 | 2 | 3 | 4 0 | 2 | 4 | 6 1 | 3 | 1 | 3 1 | 3 | 2 | 6 1 | 4 | 3 | 10 1 | 4 | 4 | 14 1 | 5 | 5 | 19 1 | 5 | 6 | 24 48 / 91

4. Window-Speciﬁc Functions https://www.ﬂickr.com/photos/michaeljohnbutton/ 49 / 91

ROW_NUMBER SELECT x, ROW_NUMBER() OVER w FROM generate_1_to_5_x2 WINDOW w AS (); x | row_number ---+------------ 1 | 1 1 | 2 2 | 3 2 | 4 3 | 5 3 | 6 4 | 7 4 | 8 5 | 9 5 | 10 ROW_NUMBER takes no arguments and operates on partitions, not window frames. https://www.postgresql.org/docs/current/ functions-window.html 50 / 91

LAG SELECT x, LAG(x, 1) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x); x | lag ---+-------- 1 | (null) 1 | 1 2 | 1 2 | 2 3 | 2 3 | 3 4 | 3 4 | 4 5 | 4 5 | 5 51 / 91

LAG(2) SELECT x, LAG(x, 2) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x); x | lag ---+-------- 1 | (null) 1 | (null) 2 | 1 2 | 1 3 | 2 3 | 2 4 | 3 4 | 3 5 | 4 5 | 4 52 / 91

LAG and LEAD SELECT x, LAG(x, 2) OVER w, LEAD(x, 2) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x); x | lag | lead ---+--------+-------- 1 | (null) | 2 1 | (null) | 2 2 | 1 | 3 2 | 1 | 3 3 | 2 | 4 3 | 2 | 4 4 | 3 | 5 4 | 3 | 5 5 | 4 | (null) 5 | 4 | (null) These operate on partitions. Defaults can be speciﬁed for nonexistent rows. 53 / 91

FIRST_VALUE and LAST_VALUE SELECT x, FIRST_VALUE(x) OVER w, LAST_VALUE(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x); x | first_value | last_value ---+-------------+------------ 1 | 1 | 1 1 | 1 | 1 2 | 1 | 2 2 | 1 | 2 3 | 1 | 3 3 | 1 | 3 4 | 1 | 4 4 | 1 | 4 5 | 1 | 5 5 | 1 | 5 These operate on window frames. 54 / 91

UNBOUNDED Window Frame SELECT x, FIRST_VALUE(x) OVER w, LAST_VALUE(x) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); x | first_value | last_value ---+-------------+------------ 1 | 1 | 5 1 | 1 | 5 2 | 1 | 5 2 | 1 | 5 3 | 1 | 5 3 | 1 | 5 4 | 1 | 5 4 | 1 | 5 5 | 1 | 5 5 | 1 | 5 55 / 91

NTH_VALUE SELECT x, NTH_VALUE(x, 3) OVER w, NTH_VALUE(x, 7) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x); x | nth_value | nth_value ---+-----------+----------- 1 | (null) | (null) 1 | (null) | (null) 2 | 2 | (null) 2 | 2 | (null) 3 | 2 | (null) 3 | 2 | (null) 4 | 2 | 4 4 | 2 | 4 5 | 2 | 4 5 | 2 | 4 This operates on window frames. 56 / 91

Show Defaults SELECT x, NTH_VALUE(x, 3) OVER w, NTH_VALUE(x, 7) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | nth_value | nth_value ---+-----------+----------- 1 | (null) | (null) 1 | (null) | (null) 2 | 2 | (null) 2 | 2 | (null) 3 | 2 | (null) 3 | 2 | (null) 4 | 2 | 4 4 | 2 | 4 5 | 2 | 4 5 | 2 | 4 57 / 91

UNBOUNDED Window Frame SELECT x, NTH_VALUE(x, 3) OVER w, NTH_VALUE(x, 7) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); x | nth_value | nth_value ---+-----------+----------- 1 | 2 | 4 1 | 2 | 4 2 | 2 | 4 2 | 2 | 4 3 | 2 | 4 3 | 2 | 4 4 | 2 | 4 4 | 2 | 4 5 | 2 | 4 5 | 2 | 4 58 / 91

RANK and DENSE_RANK SELECT x, RANK() OVER w, DENSE_RANK() OVER w FROM generate_1_to_5_x2 WINDOW w AS (); x | rank | dense_rank ---+------+------------ 1 | 1 | 1 1 | 1 | 1 2 | 1 | 1 2 | 1 | 1 3 | 1 | 1 3 | 1 | 1 4 | 1 | 1 4 | 1 | 1 5 | 1 | 1 5 | 1 | 1 These operate on CURRENT ROW peers in the partition. 59 / 91

Show Defaults SELECT x, RANK() OVER w, DENSE_RANK() OVER w FROM generate_1_to_5_x2 WINDOW w AS (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | rank | dense_rank ---+------+------------ 1 | 1 | 1 1 | 1 | 1 2 | 1 | 1 2 | 1 | 1 3 | 1 | 1 3 | 1 | 1 4 | 1 | 1 4 | 1 | 1 5 | 1 | 1 5 | 1 | 1 60 / 91

ROWS SELECT x, RANK() OVER w, DENSE_RANK() OVER w FROM generate_1_to_5_x2 WINDOW w AS (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); x | rank | dense_rank ---+------+------------ 1 | 1 | 1 1 | 1 | 1 2 | 1 | 1 2 | 1 | 1 3 | 1 | 1 3 | 1 | 1 4 | 1 | 1 4 | 1 | 1 5 | 1 | 1 5 | 1 | 1 61 / 91

Operates on Peers, so Needs ORDER BY SELECT x, RANK() OVER w, DENSE_RANK() OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x); x | rank | dense_rank ---+------+------------ 1 | 1 | 1 1 | 1 | 1 2 | 3 | 2 2 | 3 | 2 3 | 5 | 3 3 | 5 | 3 4 | 7 | 4 4 | 7 | 4 5 | 9 | 5 5 | 9 | 5 62 / 91

PERCENT_RANK, CUME_DIST, NTILE SELECT x, (PERCENT_RANK() OVER w)::numeric(10, 2), (CUME_DIST() OVER w)::numeric(10, 2), NTILE(3) OVER w FROM generate_1_to_5_x2 WINDOW w AS (ORDER BY x); x | percent_rank | cume_dist | ntile ---+--------------+-----------+------- 1 | 0.00 | 0.20 | 1 1 | 0.00 | 0.20 | 1 2 | 0.22 | 0.40 | 1 2 | 0.22 | 0.40 | 1 3 | 0.44 | 0.60 | 2 3 | 0.44 | 0.60 | 2 4 | 0.67 | 0.80 | 2 4 | 0.67 | 0.80 | 3 5 | 0.89 | 1.00 | 3 5 | 0.89 | 1.00 | 3 PERCENT_RANK is ratio of rows less than current row, excluding current row. CUME_DIST is ratio of rows <= current row. 63 / 91

PARTITION BY SELECT int4(x >= 3), x, RANK() OVER w, DENSE_RANK() OVER w FROM generate_1_to_5_x2 WINDOW w AS (PARTITION BY x >= 3 ORDER BY x) ORDER BY 1,2; int4 | x | rank | dense_rank ------+---+------+------------ 0 | 1 | 1 | 1 0 | 1 | 1 | 1 0 | 2 | 3 | 2 0 | 2 | 3 | 2 1 | 3 | 1 | 1 1 | 3 | 1 | 1 1 | 4 | 3 | 2 1 | 4 | 3 | 2 1 | 5 | 5 | 3 1 | 5 | 5 | 3 64 / 91

PARTITION BY and Other Rank Functions SELECT int4(x >= 3), x, (PERCENT_RANK() OVER w)::numeric(10,2), (CUME_DIST() OVER w)::numeric(10,2), NTILE(3) OVER w FROM generate_1_to_5_x2 WINDOW w AS (PARTITION BY x >= 3 ORDER BY x) ORDER BY 1,2; int4 | x | percent_rank | cume_dist | ntile ------+---+--------------+-----------+------- 0 | 1 | 0.00 | 0.50 | 1 0 | 1 | 0.00 | 0.50 | 1 0 | 2 | 0.67 | 1.00 | 2 0 | 2 | 0.67 | 1.00 | 3 1 | 3 | 0.00 | 0.33 | 1 1 | 3 | 0.00 | 0.33 | 1 1 | 4 | 0.40 | 0.67 | 2 1 | 4 | 0.40 | 0.67 | 2 1 | 5 | 0.80 | 1.00 | 3 1 | 5 | 0.80 | 1.00 | 3 65 / 91

5. Window Function Examples https://www.ﬂickr.com/photos/ﬁshywang/ 66 / 91

Create emp Table and Populate CREATE TABLE emp ( id SERIAL, name TEXT NOT NULL, department TEXT, salary NUMERIC(10, 2) ); INSERT INTO emp (name, department, salary) VALUES (’Andy’, ’Shipping’, 5400), (’Betty’, ’Marketing’, 6300), (’Tracy’, ’Shipping’, 4800), (’Mike’, ’Marketing’, 7100), (’Sandy’, ’Sales’, 5400), (’James’, ’Shipping’, 6600), (’Carol’, ’Sales’, 4600); https://www.postgresql.org/docs/current/tutorial-window. html 67 / 91

Generic Aggregates SELECT COUNT(*), SUM(salary), round(AVG(salary), 2) AS avg FROM emp; count | sum | avg -------+----------+--------- 7 | 40200.00 | 5742.86 69 / 91

GROUP BY SELECT department, COUNT(*), SUM(salary), round(AVG(salary), 2) AS avg FROM emp GROUP BY department ORDER BY department; department | count | sum | avg ------------+-------+----------+--------- Marketing | 2 | 13400.00 | 6700.00 Sales | 2 | 10000.00 | 5000.00 Shipping | 3 | 16800.00 | 5600.00 70 / 91

ROLLUP SELECT department, COUNT(*), SUM(salary), round(AVG(salary), 2) AS avg FROM emp GROUP BY ROLLUP(department) ORDER BY department; department | count | sum | avg ------------+-------+----------+--------- Marketing | 2 | 13400.00 | 6700.00 Sales | 2 | 10000.00 | 5000.00 Shipping | 3 | 16800.00 | 5600.00 (null) | 7 | 40200.00 | 5742.86 71 / 91

OVER SELECT name, salary, SUM(salary) OVER () FROM emp ORDER BY salary DESC; name | salary | sum -------+---------+---------- Mike | 7100.00 | 40200.00 James | 6600.00 | 40200.00 Betty | 6300.00 | 40200.00 Andy | 5400.00 | 40200.00 Sandy | 5400.00 | 40200.00 Tracy | 4800.00 | 40200.00 Carol | 4600.00 | 40200.00 73 / 91

Percentages SELECT name, salary, round(salary / SUM(salary) OVER () * 100, 2) AS pct FROM emp ORDER BY salary DESC; name | salary | pct -------+---------+------- Mike | 7100.00 | 17.66 James | 6600.00 | 16.42 Betty | 6300.00 | 15.67 Andy | 5400.00 | 13.43 Sandy | 5400.00 | 13.43 Tracy | 4800.00 | 11.94 Carol | 4600.00 | 11.44 74 / 91

Cumulative Totals Using ORDER BY SELECT name, salary, SUM(salary) OVER (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM emp ORDER BY salary DESC; name | salary | sum -------+---------+---------- Mike | 7100.00 | 7100.00 James | 6600.00 | 13700.00 Betty | 6300.00 | 20000.00 Andy | 5400.00 | 25400.00 Sandy | 5400.00 | 30800.00 Tracy | 4800.00 | 35600.00 Carol | 4600.00 | 40200.00 Cumulative totals are often useful for time-series rows. 75 / 91

Window AVG SELECT name, salary, round(AVG(salary) OVER (), 2) AS avg FROM emp ORDER BY salary DESC; name | salary | avg -------+---------+--------- Mike | 7100.00 | 5742.86 James | 6600.00 | 5742.86 Betty | 6300.00 | 5742.86 Andy | 5400.00 | 5742.86 Sandy | 5400.00 | 5742.86 Tracy | 4800.00 | 5742.86 Carol | 4600.00 | 5742.86 76 / 91

Difference Compared to Average SELECT name, salary, round(AVG(salary) OVER (), 2) AS avg, round(salary - AVG(salary) OVER (), 2) AS diff_avg FROM emp ORDER BY salary DESC; name | salary | avg | diff_avg -------+---------+---------+---------- Mike | 7100.00 | 5742.86 | 1357.14 James | 6600.00 | 5742.86 | 857.14 Betty | 6300.00 | 5742.86 | 557.14 Andy | 5400.00 | 5742.86 | -342.86 Sandy | 5400.00 | 5742.86 | -342.86 Tracy | 4800.00 | 5742.86 | -942.86 Carol | 4600.00 | 5742.86 | -1142.86 77 / 91

Compared to the Next Value SELECT name, salary, salary - LEAD(salary, 1) OVER (ORDER BY salary DESC) AS diff_next FROM emp ORDER BY salary DESC; name | salary | diff_next -------+---------+----------- Mike | 7100.00 | 500.00 James | 6600.00 | 300.00 Betty | 6300.00 | 900.00 Sandy | 5400.00 | 0.00 Andy | 5400.00 | 600.00 Tracy | 4800.00 | 200.00 Carol | 4600.00 | (null) 78 / 91

Compared to Lowest-Paid Employee SELECT name, salary, salary - LAST_VALUE(salary) OVER w AS more, round((salary - LAST_VALUE(salary) OVER w) / LAST_VALUE(salary) OVER w * 100) AS pct_more FROM emp WINDOW w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ORDER BY salary DESC; name | salary | more | pct_more -------+---------+---------+---------- Mike | 7100.00 | 2500.00 | 54 James | 6600.00 | 2000.00 | 43 Betty | 6300.00 | 1700.00 | 37 Andy | 5400.00 | 800.00 | 17 Sandy | 5400.00 | 800.00 | 17 Tracy | 4800.00 | 200.00 | 4 Carol | 4600.00 | 0.00 | 0 79 / 91

RANK and DENSE_RANK SELECT name, salary, RANK() OVER s, DENSE_RANK() OVER s FROM emp WINDOW s AS (ORDER BY salary DESC) ORDER BY salary DESC; name | salary | rank | dense_rank -------+---------+------+------------ Mike | 7100.00 | 1 | 1 James | 6600.00 | 2 | 2 Betty | 6300.00 | 3 | 3 Andy | 5400.00 | 4 | 4 Sandy | 5400.00 | 4 | 4 Tracy | 4800.00 | 6 | 5 Carol | 4600.00 | 7 | 6 80 / 91

Departmental Average SELECT name, department, salary, round(AVG(salary) OVER (PARTITION BY department), 2) AS avg, round(salary - AVG(salary) OVER (PARTITION BY department), 2) AS diff_avg FROM emp ORDER BY department, salary DESC; name | department | salary | avg | diff_avg -------+------------+---------+---------+---------- Mike | Marketing | 7100.00 | 6700.00 | 400.00 Betty | Marketing | 6300.00 | 6700.00 | -400.00 Sandy | Sales | 5400.00 | 5000.00 | 400.00 Carol | Sales | 4600.00 | 5000.00 | -400.00 James | Shipping | 6600.00 | 5600.00 | 1000.00 Andy | Shipping | 5400.00 | 5600.00 | -200.00 Tracy | Shipping | 4800.00 | 5600.00 | -800.00 81 / 91

WINDOW Clause SELECT name, department, salary, round(AVG(salary) OVER d, 2) AS avg, round(salary - AVG(salary) OVER d, 2) AS diff_avg FROM emp WINDOW d AS (PARTITION BY department) ORDER BY department, salary DESC; name | department | salary | avg | diff_avg -------+------------+---------+---------+---------- Mike | Marketing | 7100.00 | 6700.00 | 400.00 Betty | Marketing | 6300.00 | 6700.00 | -400.00 Sandy | Sales | 5400.00 | 5000.00 | 400.00 Carol | Sales | 4600.00 | 5000.00 | -400.00 James | Shipping | 6600.00 | 5600.00 | 1000.00 Andy | Shipping | 5400.00 | 5600.00 | -200.00 Tracy | Shipping | 4800.00 | 5600.00 | -800.00 82 / 91

Departmental and Global Ranks SELECT name, department, salary, RANK() OVER s AS dept_rank, RANK() OVER (ORDER BY salary DESC) AS global_rank FROM emp WINDOW s AS (PARTITION BY department ORDER BY salary DESC) ORDER BY department, salary DESC; name | department | salary | dept_rank | global_rank -------+------------+---------+-----------+------------- Mike | Marketing | 7100.00 | 1 | 1 Betty | Marketing | 6300.00 | 2 | 3 Sandy | Sales | 5400.00 | 1 | 4 Carol | Sales | 4600.00 | 2 | 7 James | Shipping | 6600.00 | 1 | 2 Andy | Shipping | 5400.00 | 2 | 4 Tracy | Shipping | 4800.00 | 3 | 6 84 / 91

6. Considerations https://www.ﬂickr.com/photos/10413717@N08/ 85 / 91

Tips • Do you want to split the set? (PARTITION BY creates multiple partitions) • Do you want an order in the partition? (use ORDER BY) • How do you want to handle rows with the same ORDER BY values? • RANGE vs ROW • RANK vs DENSE_RANK • Do you need to define a window frame? • Window functions can define their own partitions, ordering, and window frames. • Multiple window names can be defined in the WINDOW clause. • Pay attention to whether window functions operate on frames or partitions. 86 / 91

Window Function Summary Scope Type Function Description frame computation generic aggs. e.g., SUM, AVG row access FIRST_VALUE ﬁrst frame value LAST_VALUE last frame value NTH_VALUE nth frame value partition row access LAG row before current LEAD row after current ROW_NUMBER current row number ranking CUME_DIST cumulative distribution DENSE_RANK rank without gaps NTILE rank in n partitions PERCENT_RANK percent rank RANK rank with gaps Window functions never process rows outside their partitions. However, without PARTITION BY the partition is the entire set. 87 / 91

Postgres 11 Improvements: RANGE AND GROUPS • Allow RANGE window frames to specify peer groups whose values are plus or minus the speciﬁed PRECEDING/FOLLOWING offset • Add GROUPS window frames which specify the number of peer groups PRECEDING/FOLLOWING the current peer group: WINDOW ( [PARTITION BY …] [ORDER BY …] [ { RANGE | ROW | GROUPS } { frame_start | BETWEEN frame_start AND frame_end } ] ) 88 / 91

Postgres 11 Improvements: Frame Exclusion • New frame_exclusion clause: WINDOW ( [PARTITION BY …] [ORDER BY …] [ { RANGE | ROW | GROUPS } { frame_start | BETWEEN frame_start AND frame_end } frame_exclusion ] ) where frame_exclusion can be: • EXCLUDE CURRENT ROW • EXCLUDE GROUP (exclude peer group) • EXCLUDE TIES (exclude other peers) • EXCLUDE NO OTHERS 89 / 91

Additional Resources… • Postgres Downloads: • www.enterprisedb.com/downloads • Product and Services information: • info@enterprisedb.com 90 / 91

Conclusion https://momjian.us/presentations https://www.ﬂickr.com/photos/10318765@N03/ 91 / 91

The Magic of Window Functions in Postgres

More Related Content

What's hot

Similar to The Magic of Window Functions in Postgres

More from EDB

Recently uploaded

The Magic of Window Functions in Postgres