Like all programming languages and libraries, DuckDB has its share of idiosyncrasies and inconsistencies.
Some are vestiges of our feathered friend's evolution; others are inevitable because we strive to adhere to the SQL Standard and specifically to PostgreSQL's dialect (see the “PostgreSQL Compatibility” page for exceptions). The rest may simply come down to different preferences, or we may even agree on what should be done but just haven’t gotten around to it yet.
Acknowledging these quirks is the best we can do, which is why we have compiled below a list of examples.
Aggregating Empty Groups
On empty groups, the aggregate functions sum, list, and string_agg all return NULL instead of 0, [] and '', respectively. This is dictated by the SQL Standard and obeyed by all SQL implementations we know. This behavior is inherited by the list aggregate list_sum, but not by the DuckDB original list_dot_product which returns 0 on empty lists.
0 vs. 1-Based Indexing
To comply with standard SQL, one-based indexing is used almost everywhere, e.g., array and string indexing and slicing, and window functions (row_number, rank, dense_rank). However, similarly to PostgreSQL, JSON features use a zero-based indexing.
Types
UINT8 vs. INT8
UINT8 and INT8 are aliases to integer types of different widths:
UINT8corresponds toUTINYINTbecause it's an 8-bit unsigned integerINT8corresponds toBIGINTbecause it's an 8-byte signed integer
Explanation: the n in the numeric type INTn and UINTn denote the width of the number in either bytes or bits. INT1, INT2, INT4 correspond to the number of bytes, while INT16, INT32 and INT64 correpsond to the number of bits. The same applies to UINT values. However, the value n = 8 is a valid choice for both the number of bits and bytes. For unsigned values, UINT8 corresponds to UTINYINT (8 bits). For signed values, INT8 corresponds to BIGINT (8 bytes).
Expressions
Results That May Surprise You
| Expression | Result | Note |
|---|---|---|
-2^2 | 4.0 | PostgreSQL compatibility means the unary minus has higher precedence than the exponentiation operator. Use additional parentheses, e.g., -(2^2) or the pow function, e.g., -pow(2, 2), to avoid mistakes. |
't' = true | true | Compatible with PostgreSQL. |
1 = '1' | true | Compatible with PostgreSQL. |
1 = ' 1' | true | Compatible with PostgreSQL. |
1 = '01' | true | Compatible with PostgreSQL. |
1 = ' 01 ' | true | Compatible with PostgreSQL. |
1 = true | true | Not compatible with PostgreSQL. |
1 = '1.1' | true | Not compatible with PostgreSQL. |
1 IN (0, NULL) | NULL | Makes sense if you think of the NULLs in the input and output as UNKNOWN. |
1 in [0, NULL] | false | |
concat('abc', NULL) | abc | Compatible with PostgreSQL. list_concat behaves similarly. |
'abc' || NULL | NULL |
NaN Values
'NaN'::FLOAT = 'NaN'::FLOAT and 'NaN'::FLOAT > 3 violate IEEE-754 but mean floating point data types have a total order, like all other data types (beware the consequences for greatest / least).
age Function
age(x) is current_date - x instead of current_timestamp - x. Another quirk inherited from PostgreSQL.
Extract Functions
list_extract / map_extract return NULL on non-existing keys. struct_extract throws an error because keys of structs are like columns.
Clauses
Automatic Column Deduplication in SELECT
Column names are deduplicated with the first occurrence shadowing the others:
CREATE TABLE tbl AS SELECT 1 AS a; SELECT a FROM (SELECT *, 2 AS a FROM tbl); | a |
|---|
| 1 |
Case Insensitivity for SELECTing Columns
Due to case-insensitivity, it's not possible to use SELECT a FROM 'file.parquet' when a column called A appears before the desired column a in file.parquet.
USING SAMPLE
The USING SAMPLE clause is syntactically placed after the WHERE and GROUP BY clauses (same as the LIMIT clause) but is semantically applied before both (unlike the LIMIT clause).