arrayJoin function

The arrayJoin function takes each row and generates a set of rows (unfold).

This function takes an array as an argument, and propagates the source row to multiple rows for the number of elements in the array. All the values in columns are simply copied, except the values in the column where this function is applied; it's replaced with the corresponding array value.

Example:

SELECT arrayJoin([1, 2, 3] AS src) AS dst, 'Hello', src 
┌─dst─┬─\'Hello\'─┬─src─────┐ │ 1 │ Hello │ [1,2,3] │ │ 2 │ Hello │ [1,2,3] │ │ 3 │ Hello │ [1,2,3] │ └─────┴───────────┴─────────┘ 

The arrayJoin function affects all sections of the query, including the WHERE section. Notice the result 2, even though the subquery returned 1 row.

Example:

SELECT sum(1) AS impressions FROM ( SELECT ['Istanbul', 'Berlin', 'Bobruisk'] AS cities ) WHERE arrayJoin(cities) IN ['Istanbul', 'Berlin'] 
┌─impressions─┐ │ 2 │ └─────────────┘ 

A query can use multiple arrayJoin functions. In this case, the transformation is performed multiple times and the rows are multiplied.

Example:

SELECT sum(1) AS impressions, arrayJoin(cities) AS city, arrayJoin(browsers) AS browser FROM ( SELECT ['Istanbul', 'Berlin', 'Bobruisk'] AS cities, ['Firefox', 'Chrome', 'Chrome'] AS browsers ) GROUP BY 2, 3 
┌─impressions─┬─city─────┬─browser─┐ │ 2 │ Istanbul │ Chrome │ │ 1 │ Istanbul │ Firefox │ │ 2 │ Berlin │ Chrome │ │ 1 │ Berlin │ Firefox │ │ 2 │ Bobruisk │ Chrome │ │ 1 │ Bobruisk │ Firefox │ └─────────────┴──────────┴─────────┘ 

Using multiple arrayJoin with same expression might not produce expected results due to optimizations. For those cases, consider modifying repeated array expression with extra operations that don't affect join result, for example arrayJoin(arraySort(arr)), arrayJoin(arrayConcat(arr, []))

Example:

SELECT arrayJoin(dice) as first_throw, /* arrayJoin(dice) as second_throw */ -- is technically correct, but will annihilate result set arrayJoin(arrayConcat(dice, [])) as second_throw -- intentionally changed expression to force re-evaluation FROM ( SELECT [1, 2, 3, 4, 5, 6] as dice ) 

Note the ARRAY JOIN syntax in the SELECT query, which provides broader possibilities. ARRAY JOIN allows you to convert multiple arrays with the same number of elements at a time.

Example:

SELECT sum(1) AS impressions, city, browser FROM ( SELECT ['Istanbul', 'Berlin', 'Bobruisk'] AS cities, ['Firefox', 'Chrome', 'Chrome'] AS browsers ) ARRAY JOIN cities AS city, browsers AS browser GROUP BY 2, 3 
┌─impressions─┬─city─────┬─browser─┐ │ 1 │ Istanbul │ Firefox │ │ 1 │ Berlin │ Chrome │ │ 1 │ Bobruisk │ Chrome │ └─────────────┴──────────┴─────────┘ 

Or you can use Tuple. For example:

SELECT sum(1) AS impressions, (arrayJoin(arrayZip(cities, browsers)) AS t).1 AS city, t.2 AS browser FROM ( SELECT ['Istanbul', 'Berlin', 'Bobruisk'] AS cities, ['Firefox', 'Chrome', 'Chrome'] AS browsers ) GROUP BY 2, 3 
┌─impressions─┬─city─────┬─browser─┐ │ 1 │ Istanbul │ Firefox │ │ 1 │ Berlin │ Chrome │ │ 1 │ Bobruisk │ Chrome │ └─────────────┴──────────┴─────────┘ 
Updated