๐ง Understanding join_use_nulls in ClickHouse
ClickHouse is famous for being blazing fast โ but sometimes its SQL semantics can surprise you, especially around JOINs.
Hereโs a simple example that shows how the join_use_nulls setting can completely change your results.
๐งฉ The Setup
Letโs create two tiny tables:
CREATE TABLE test.id_val(`id` UInt32, `val` UInt32) ENGINE = TinyLog; INSERT INTO test.id_val VALUES (1,11),(2,12),(3,13); CREATE TABLE test.id_val2(`id` UInt32, `val` UInt32) ENGINE = TinyLog; INSERT INTO test.id_val2 VALUES (1,21),(1,22),(3,23); Weโve got:
id_val: three rows with IDs 1, 2, and 3
id_val2: three rows, but ID 2 is missing and ID 1 appears twice
๐งฎ The Default JOIN
Letโs run a LEFT JOIN:
SELECT * FROM test.id_val LEFT JOIN test.id_val2 USING (id); Output:
โโidโโฌโvalโโฌโval_1โโ โ 1 โ 11 โ 21 โ โ 1 โ 11 โ 22 โ โ 2 โ 12 โ 0 โ โ 3 โ 13 โ 23 โ โโโโโโดโโโโโโดโโโโโโโโ Notice something interesting?
When thereโs no matching row for id=2, ClickHouse fills it with a default value โ in this case, 0.
For numeric columns, the default is 0.
For strings, it would be an empty string ''.
๐งฉ Enter join_use_nulls
If you want standard SQL behavior, where missing values become NULL, use:
SELECT * FROM test.id_val LEFT JOIN test.id_val2 USING (id) SETTINGS join_use_nulls = 1; Now the output looks like this:
โโidโโฌโvalโโฌโval_1โโ โ 1 โ 11 โ 21 โ โ 1 โ 11 โ 22 โ โ 2 โ 12 โ NULL โ โ 3 โ 13 โ 23 โ โโโโโโดโโโโโโดโโโโโโโโ Thatโs closer to what most SQL engines (like PostgreSQL or MySQL) would do.
โ๏ธ Why This Matters
This setting affects how ClickHouse represents non-matching rows in joins โ which can have downstream effects on:
Aggregations
SELECT count(val_1) FROM ...
will count 0s differently than NULLs.Exports & BI tools
Some tools treat 0 as a valid number, but NULL as missing data.Data semantics
A missing match isnโt always โzeroโ โ it might mean โunknownโ.
๐ Performance Note
The default behavior (without join_use_nulls) is slightly faster and more memory-efficient because ClickHouse doesnโt have to track NULL bitmaps.
If youโre doing high-volume analytical joins and donโt care about SQL-standard nulls, keep it off.
If you need correctness and consistency with other systems, turn it on.
๐ Check Your Settings
You can check the current value of this setting in your ClickHouse session:
SELECT name, value FROM system.settings WHERE name LIKE '%join_use_nulls%'; Or set it globally in your configuration file or session:
SET join_use_nulls = 1; ๐ Reference
For more detail on ClickHouseโs Join engine and its settings (including join_use_nulls), see: ClickHouse docs โ Join table engine
๐จ๏ธ Discussion
Have you ever been surprised by ClickHouse JOIN behavior?
Do you leave join_use_nulls off for performance, or always turn it on for clarity?
Top comments (0)