Oracle CUME_DIST

Summary: in this tutorial, you will learn how to calculate a cumulative distribution of a value in a set of values by using the Oracle CUME_DIST() function.

Introduction to Oracle CUME_DIST() function #

Sometimes, you want to pull the top or bottom x% values from a data set e.g., top 5% salesman by volume. To do this, you can use the CUME_DIST() function.

The CUME_DIST() function is an analytic function that calculates the cumulative distribution of a value in a set of values.

Here’s the basic syntax of CUME_DIST() function:

CUME_DIST() OVER ( [ partition_clause ] order_by_clause )Code language: SQL (Structured Query Language) (sql)

The ORDER BY clause specifies the order of rows in each partition or result set. Since CUME_DIST() function is order sensitive, the order_by_clause is mandatory.

The order_by_clause has the following syntax:

ORDER BY expression1 [ASC | DESC ] [NULLS FIRST | LAST] [, expression2 [ASC | DESC ] [NULLS FIRST | LAST],... ] Code language: SQL (Structured Query Language) (sql)

The PARTITION BY clause divides the result set into multiple partitions. The partition_clause has the following form:

PARTITION BY expression1 [,expression2,..]Code language: SQL (Structured Query Language) (sql)

The partition by clause is optional. If you omit this clause, the CUME_DIST() function will treat the whole result set as a single partition.

The CUME_DIST function uses the following formula for calculation:

CUME_DIST = (number of rows with value <= current value) / total rowsCode language: SQL (Structured Query Language) (sql)

It returns a result that is between 0.0 (0%) and 1.0 (100%). The tied values receive the same cumulative distribution.

Basic Oracle CUME_DIST function example #

First, create a new table called scores with two columns id and score:

CREATE TABLE scores ( id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, score NUMBER NOT NULL );Code language: SQL (Structured Query Language) (sql)

Try it

Second, insert some rows into the scores table:

INSERT INTO scores(score) VALUES(80); INSERT INTO scores(score) VALUES(90); INSERT INTO scores(score) VALUES(90); INSERT INTO scores(score) VALUES(100);Code language: SQL (Structured Query Language) (sql)

Try it

Third, retrieve data from the scores table:

SELECT * FROM scores;Code language: SQL (Structured Query Language) (sql)

Try it

Output:

 ID SCORE ---------- ---------- 1 80 2 90 3 90 4 100Code language: SQL (Structured Query Language) (sql)

Finally, calculate the cumulative distribution of a score within a set of scores:

SELECT id, score, CUME_DIST() OVER ( ORDER BY score ) AS cume_dist FROM scores;Code language: SQL (Structured Query Language) (sql)

Try it

Output:

 ID SCORE CUME_DIST ---------- ---------- ---------- 1 80 .25 2 90 .75 3 90 .75 4 100 1Code language: SQL (Structured Query Language) (sql)

How it works:

In this example, the result set has 4 rows. The ORDER BY clause in the CUME_DIST function sorts the scores from low to high.

RowScoreCUME_DISTValue
1801/4 (only 1 row ≤ 80)0.25
2903/4 (3 rows ≤ 90)0.75
3903/4 (same as above, tied value)0.75
41004/4 (all rows ≤ 100)1.00

Using Oracle CUME_DIST() function over a result set #

The following statement uses CUME_DIST() function to calculate the sales percentile for each salesman in 2017:

SELECT salesman_id, sales, ROUND(CUME_DIST() OVER (ORDER BY sales DESC) * 100,2) || '%' cume_dist FROM salesman_performance WHERE year = 2017;Code language: SQL (Structured Query Language) (sql)

Try it

Here is the result:

Oracle CUME_DIST Function Example

As shown in the output, 33.33 % of salesman have sales amounts greater than 1.99 million.

Using Oracle CUME_DIST() function over partitions #

The following statement uses the CUME_DIST function to calculate the sales percentile for each salesman in 2016 and 2017.

SELECT salesman_id, year, sales, ROUND(CUME_DIST() OVER ( PARTITION BY year ORDER BY sales DESC ) * 100,2) || '%' cume_dist FROM salesman_performance WHERE year in (2016, 2017); Code language: SQL (Structured Query Language) (sql)

Try it

Here is the output:

Oracle CUME_DIST Function over the partition example

In this example:

  • The PARTITION BY clause divides the result set into two partitions by year, 2016 and 2017.
  • The ORDER BY clause sorts the rows in each partition by sales amount in descending order to which the CUME_DIST() function applies.

To get the top 30% of the salesman by sales revenue in 2016 and 2017, you use the following query:

WITH cte_sales AS ( SELECT salesman_id, year, sales, ROUND(CUME_DIST() OVER ( PARTITION BY year ORDER BY sales DESC ),2) cume_dist FROM salesman_performance WHERE year in (2016,2017) ) SELECT * FROM cte_sales WHERE cume_dist <= 0.30; Code language: SQL (Structured Query Language) (sql)

Try it

Output:

Oracle CUME_DIST top-N example

Summary #

  • Use the Oracle CUME_DIST() function to calculate the cumulative distribution of a value in a set of values.

Quiz #

Was this tutorial helpful?