Using dynamic union tables
Use case
Sometimes, you may have a lot of tables in a database, which actually relate to the same entity.
For example, you can have “per client” tables with the same data, but related to different customers: elon_musk_table
, john_doe_table
, steve_jobs_table
, etc. In this case, it would make sense to create a single cube for customers, which should be backed by a union table from all customers tables.
Data modeling
You can use the sql
parameter to define a cube over an arbitrary SQL query, e.g., a query that includes UNION
or UNION ALL
operators:
cubes: - name: customers sql: > SELECT *, 'Einstein' AS name FROM einstein_data UNION ALL SELECT *, 'Pascal' AS name FROM pascal_data UNION ALL SELECT *, 'Newton' AS name FROM newton_data measures: - name: count type: count dimensions: - name: name sql: name type: string
However, it can be quite annoying to write the SQL to union all tables manually. Luckily, you can use dynamic data modeling to generate necessary SQL based on a list of tables:
{%- set customer_tables = { "einstein_data": "Einstein", "pascal_data": "Pascal", "newton_data": "Newton" } -%} cubes: - name: customers sql: > {%- for table, name in customer_tables | items %} SELECT *, '{{ name | safe }}' AS name FROM {{ table | safe }} {% if not loop.last %}UNION ALL{% endif %} {% endfor %} measures: - name: count type: count dimensions: - name: name sql: name type: string
Result
Querying count
and name
members of the dynamically defined customers
cube would result in the following generated SQL:
SELECT "customers".name "customers__name", count(*) "customers__count" FROM ( SELECT *, 'Einstein' AS name FROM einstein_data UNION ALL SELECT *, 'Pascal' AS name FROM pascal_data UNION ALL SELECT *, 'Newton' AS name FROM newton_data ) AS "customers" GROUP BY 1 ORDER BY 2 DESC