Summary: in this tutorial, you will learn how to use the SQLite CUME_DIST()
function to calculate the cumulative distribution of a value within a group of values.
Introduction to SQLite CUME_DIST() Function
The CUME_DIST()
is a window function that returns the cumulative distribution of a value relative to the values in the group.
Here’s the syntax of the CUME_DIST()
function:
CUME_DIST() OVER ( [PARTITION BY partition_expression] [ORDER BY order_list] )
Code language: SQL (Structured Query Language) (sql)
In this syntax:
- The
PARTITION BY
clause specifies how the rows are grouped into partitions to which theCUME_DIST()
function applies. If you skip thePARTITION BY
clause, the function treats the whole result set as a single partition. - The
ORDER BY
clause specifies the order of rows in each partition to which theCUME_DIST()
function applies. If you omit theORDER BY
clause, the function returns 1 for all rows.
Suppose N
is the value of the current row of the column specified in the ORDER BY
clause and the order of rows is from low to high, the cumulative distribution of a value is calculated using the following formula:
The number of rows with values <= N / The number of rows in the window or partition
Code language: SQL (Structured Query Language) (sql)
The return value of the CUME_DIST()
function is greater than 0 and less than or equal to 1:
0 < CUME_DIST() <= 1
Code language: SQL (Structured Query Language) (sql)
The rows with the same values receive the same result.
SQLite CUME_DIST() function example
First, create a new table named products
for the demonstration:
CREATE TABLE products( id INTEGER PRIMARY KEY, name TEXT NOT NULL, color TEXT NOT NULL, price REAL NOT NULL );
Code language: SQL (Structured Query Language) (sql)
Second, insert some rows into the products
table:
INSERT INTO products (name, color, price) VALUES ('A', 'red', 100), ('B', 'red', 200), ('C', 'red', 200), ('D', 'black', 300), ('E', 'black', 400), ('F', 'white', 500);
Code language: SQL (Structured Query Language) (sql)
Third, query data from the products
table:
SELECT * FROM products;
Code language: SQL (Structured Query Language) (sql)
Output:
id | name | color | price ---+------+-------+------ 1 | A | red | 100.0 2 | B | red | 200.0 3 | C | red | 200.0 4 | D | black | 300.0 5 | E | black | 400.0 6 | F | white | 500.0 (6 rows)
Fourth, calculate the cumulative distribution of the price in the products
table:
SELECT name, CUME_DIST() OVER ( ORDER BY price ) PriceCumulativeDistribution FROM products;
Code language: SQL (Structured Query Language) (sql)
Here is the output:
name | PriceCumulativeDistribution -----+---------------------------- A | 0.16666666666666666 B | 0.5 C | 0.5 D | 0.6666666666666666 E | 0.8333333333333334 F | 1.0 (6 rows)
Because we skipped the PARTITION BY
clause, the function treated the whole result set as a single partition. Therefore, the number of rows to be evaluated is 6.
The following example uses the CUME_DIST()
function to calculate the cumulative distribution of prices partitioned by colors:
SELECT name, color, price, CUME_DIST() OVER ( PARTITION BY color ORDER BY price ) PriceCumulativeDistribution FROM products ORDER BY color;
Code language: SQL (Structured Query Language) (sql)
Output:
name | color | price | PriceCumulativeDistribution -----+-------+-------+---------------------------- D | black | 300.0 | 0.5 E | black | 400.0 | 1.0 A | red | 100.0 | 0.3333333333333333 B | red | 200.0 | 1.0 C | red | 200.0 | 1.0 F | white | 500.0 | 1.0 (6 rows)
Summary
- Use the
CUME_DIST()
function to calculate the cumulative distribution of value in a set of values.