Skip to content

Commit 1ad8285

Browse files
authored
enabled syntax highlighting for sql
1 parent d14c809 commit 1ad8285

File tree

1 file changed

+73
-51
lines changed

1 file changed

+73
-51
lines changed

README.md

Lines changed: 73 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -55,23 +55,25 @@ Usage
5555
"Hello World"
5656
-------------
5757

58-
--- Make a dummy table
59-
CREATE TABLE helloworld (
60-
id integer,
61-
set hll
62-
);
58+
```sql
59+
--- Make a dummy table
60+
CREATE TABLE helloworld (
61+
id integer,
62+
set hll
63+
);
6364

64-
--- Insert an empty HLL
65-
INSERT INTO helloworld(id, set) VALUES (1, hll_empty());
65+
--- Insert an empty HLL
66+
INSERT INTO helloworld(id, set) VALUES (1, hll_empty());
6667

67-
--- Add a hashed integer to the HLL
68-
UPDATE helloworld SET set = hll_add(set, hll_hash_integer(12345)) WHERE id = 1;
68+
--- Add a hashed integer to the HLL
69+
UPDATE helloworld SET set = hll_add(set, hll_hash_integer(12345)) WHERE id = 1;
6970

70-
--- Or add a hashed string to the HLL
71-
UPDATE helloworld SET set = hll_add(set, hll_hash_text('hello world')) WHERE id = 1;
71+
--- Or add a hashed string to the HLL
72+
UPDATE helloworld SET set = hll_add(set, hll_hash_text('hello world')) WHERE id = 1;
7273

73-
--- Get the cardinality of the HLL
74-
SELECT hll_cardinality(set) FROM helloworld WHERE id = 1;
74+
--- Get the cardinality of the HLL
75+
SELECT hll_cardinality(set) FROM helloworld WHERE id = 1;
76+
```
7577

7678
Now with the silly stuff out of the way, here's a more realistic use case.
7779

@@ -80,56 +82,70 @@ Data Warehouse Use Case
8082

8183
Let's assume I've got a fact table that records users' visits to my site, what they did, and where they came from. It's got hundreds of millions of rows. Table scans take minutes (or at least lots and lots of seconds.)
8284

83-
CREATE TABLE facts (
84-
date date,
85-
user_id integer,
86-
activity_type smallint,
87-
referrer varchar(255)
88-
);
85+
```sql
86+
CREATE TABLE facts (
87+
date date,
88+
user_id integer,
89+
activity_type smallint,
90+
referrer varchar(255)
91+
);
92+
```
8993

9094
I'd really like a quick (milliseconds) idea of how many unique users are visiting per day for my dashboard. No problem, let's set up an aggregate table:
9195

92-
-- Create the destination table
93-
CREATE TABLE daily_uniques (
94-
date date UNIQUE,
95-
users hll
96-
);
96+
```sql
97+
-- Create the destination table
98+
CREATE TABLE daily_uniques (
99+
date date UNIQUE,
100+
users hll
101+
);
97102

98-
-- Fill it with the aggregated unique statistics
99-
INSERT INTO daily_uniques(date, users)
100-
SELECT date, hll_add_agg(hll_hash_integer(user_id))
101-
FROM facts
102-
GROUP BY 1;
103+
-- Fill it with the aggregated unique statistics
104+
INSERT INTO daily_uniques(date, users)
105+
SELECT date, hll_add_agg(hll_hash_integer(user_id))
106+
FROM facts
107+
GROUP BY 1;
108+
```
103109

104110
We're first hashing the `user_id`, then aggregating those hashed values into one `hll` per day. Now we can ask for the cardinality of the `hll` for each day:
105111

106-
SELECT date, hll_cardinality(users) FROM daily_uniques;
112+
```sql
113+
SELECT date, hll_cardinality(users) FROM daily_uniques;
114+
```
107115

108116
You're probably thinking, "But I could have done this with `COUNT DISTINCT`!" And you're right, you could have. But then you only ever answer a single question: "How many unique users did I see each day?"
109117

110118
What if you wanted to this week's uniques?
111119

112-
SELECT hll_cardinality(hll_union_agg(users)) FROM daily_uniques WHERE date >= '2012-01-02'::date AND date <= '2012-01-08'::date;
120+
```sql
121+
SELECT hll_cardinality(hll_union_agg(users)) FROM daily_uniques WHERE date >= '2012-01-02'::date AND date <= '2012-01-08'::date;
122+
```
113123

114124
Or the monthly uniques for this year?
115125

116-
SELECT EXTRACT(MONTH FROM date) AS month, hll_cardinality(hll_union_agg(users))
117-
FROM daily_uniques
118-
WHERE date >= '2012-01-01' AND
119-
date < '2013-01-01'
120-
GROUP BY 1;
126+
```sql
127+
SELECT EXTRACT(MONTH FROM date) AS month, hll_cardinality(hll_union_agg(users))
128+
FROM daily_uniques
129+
WHERE date >= '2012-01-01' AND
130+
date < '2013-01-01'
131+
GROUP BY 1;
132+
```
121133

122134
Or how about a sliding window of uniques over the past 6 days?
123135

124-
SELECT date, #hll_union_agg(users) OVER seven_days
125-
FROM daily_uniques
126-
WINDOW seven_days AS (ORDER BY date ASC ROWS 6 PRECEDING);
136+
```sql
137+
SELECT date, #hll_union_agg(users) OVER seven_days
138+
FROM daily_uniques
139+
WINDOW seven_days AS (ORDER BY date ASC ROWS 6 PRECEDING);
140+
```
127141

128142
Or the number of uniques you saw yesterday that you didn't see today?
129143

130-
SELECT date, (#hll_union_agg(users) OVER two_days) - #users AS lost_uniques
131-
FROM daily_uniques
132-
WINDOW two_days AS (ORDER BY date ASC ROWS 1 PRECEDING);
144+
```sql
145+
SELECT date, (#hll_union_agg(users) OVER two_days) - #users AS lost_uniques
146+
FROM daily_uniques
147+
WINDOW two_days AS (ORDER BY date ASC ROWS 1 PRECEDING);
148+
```
133149

134150
These are just a few examples of the types of queries that would return in milliseconds in an `hll` world from a single aggregate, but would require either completely separate pre-built aggregates or self-joins or `generate_series` trickery in a `COUNT DISTINCT` world.
135151

@@ -278,23 +294,29 @@ Aggregate functions
278294

279295
If you want to create a `hll` from a table or result set, use `hll_add_agg`. The naming here isn't particularly creative: it's an **agg**regate function that **add**s the values to an empty `hll`.
280296

281-
SELECT date, hll_add_agg(hll_hash_integer(user_id))
282-
FROM facts
283-
GROUP BY 1;
297+
```sql
298+
SELECT date, hll_add_agg(hll_hash_integer(user_id))
299+
FROM facts
300+
GROUP BY 1;
301+
```
284302

285303
The above example will give you a `hll` for each date that contains each day's users.
286304

287305
If you want to summarize a list of `hll`s that you already have stored into a single `hll`, use `hll_union_agg`. Again: it's an **agg**regate function that **union**s the values into an empty `hll`.
288306

289-
SELECT EXTRACT(MONTH FROM date), hll_cardinality(hll_union_agg(users))
290-
FROM daily_uniques
291-
GROUP BY 1;
307+
```sql
308+
SELECT EXTRACT(MONTH FROM date), hll_cardinality(hll_union_agg(users))
309+
FROM daily_uniques
310+
GROUP BY 1;
311+
```
292312

293313
Sliding windows are another prime example of the power of `hll`s. Doing sliding window unique counting typically involves some `generate_series` trickery, but it's quite simple with the `hll`s you've already computed for your roll-ups.
294314

295-
SELECT date, #hll_union_agg(users) OVER seven_days
296-
FROM daily_uniques
297-
WINDOW seven_days AS (ORDER BY date ASC ROWS 6 PRECEDING);
315+
```sql
316+
SELECT date, #hll_union_agg(users) OVER seven_days
317+
FROM daily_uniques
318+
WINDOW seven_days AS (ORDER BY date ASC ROWS 6 PRECEDING);
319+
```
298320

299321
Explanation of Parameters and Tuning
300322
------------------------------------

0 commit comments

Comments
 (0)