This package builds a mart of tables and views describing the project it is installed in. In pre V1 versions of the package, the artifacts dbt produces were uploaded to the warehouse, hence the name of the package. That's no longer the case, but the name has stuck!
The package currently supports Databricks, Spark and Snowflake adapters.
Models included:
dim_dbt__current_models dim_dbt__exposures dim_dbt__models dim_dbt__seeds dim_dbt__snapshots dim_dbt__sources dim_dbt__tests fct_dbt__invocations fct_dbt__model_executions fct_dbt__seed_executions fct_dbt__snapshot_executions fct_dbt__test_executions See the generated dbt docs site for documentation on each model.
- Add this package to your
packages.yml:
packages: - package: brooklyn-data/dbt_artifacts version: 1.2.0 -
Run
dbt depsto install the package -
Add an on-run-end hook to your
dbt_project.yml:on-run-end: "{{ dbt_artifacts.upload_results(results) }}"(We recommend adding a conditional here so that the upload only occurs in your production environment, such ason-run-end: "{% if target.name == 'prod' %}{{ dbt_artifacts.upload_results(results) }}{% endif %}") -
If you are using selectors, be sure to include the
dbt_artifactsmodels in your dbt invocation step. -
Run your project!
👷 Always run the dbt_artifacts models in every dbt invocation which uses the
upload_resultsmacro. This ensures that the source models always have the correct fields in case of an update.
The following configuration can be used to specify where the raw (sources) data is uploaded, and where the dbt models are created:
models: ... dbt_artifacts: +database: your_destination_database # optional, default is your target database +schema: your_destination_schema # optional, default is your target schema staging: +database: your_destination_database # optional, default is your target database +schema: your_destination_schema # optional, default is your target schema sources: +database: your_sources_database # optional, default is your target database +schema: your sources_database # optional, default is your target schemaNote that model materializations and on_schema_change configs are defined in this package's dbt_project.yml, so do not set them globally in your dbt_project.yml (see docs on configuring packages):
Configurations made in your dbt_project.yml file will override any configurations in a package (either in the dbt_project.yml file of the package, or in config blocks).
If the project is running in dbt Cloud, the following five columns (https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables#special-environment-variables) will be automatically populated in the fct_dbt__invocations model:
- dbt_cloud_project_id
- dbt_cloud_job_id
- dbt_cloud_run_id
- dbt_cloud_run_reason_category
- dbt_cloud_run_reason
To capture other environment variables in the fct_dbt__invocations model in the env_vars column, add them to the env_vars variable in your dbt_project.yml. Note that environment variables with secrets (DBT_ENV_SECRET_) can't be logged.
vars: env_vars: [ 'ENV_VAR_1', 'ENV_VAR_2', '...' ]To capture dbt variables in the fct_dbt__invocations model in the dbt_vars column, add them to the dbt_vars variable in your dbt_project.yml.
vars: dbt_vars: [ 'var_1', 'var_2', '...' ]If you were using the following variables:
vars: dbt_artifacts_database: your_db dbt_artifacts_schema: your_schemaYou must now move these to the following model configs:
models: ... dbt_artifacts: sources: +database: your_db +schema: your_schemaThat's because the raw tables are now managed as dbt models. Be aware of any impact that generate_database_name and generate_schema_name macros may have on the final database/schema.
To migrate your existing data from the dbt-artifacts versions <=0.8.0, a helper macro and guide is provided. This migration uses the old fct_* and dim_* models' data to populate the new sources. The steps to use the macro are as follows:
- If not already completed, run
dbt run-operation create_dbt_artifacts_tablesto make your source tables. - Run
dbt run-operation migrate_from_v0_to_v1 --args '<see-below-for-arguments>'. - Verify that the migration completes successfully.
- Manually delete any database objects (sources, staging models, tables/views) from the previous
dbt-artifactsversion.
The arguments for migrate_from_v0_to_v1 are as follows:
| argument | description |
|---|---|
old_database | the database of the <1.0.0 output (fct_/dim_) models |
old_schema | the schema of the <1.0.0 output (fct_/dim_) models |
new_database | the target database that the artifact sources are in |
new_schema | the target schema that the artifact sources are in |
The old and new database/schemas do not have to be different, but it is explicitly defined for flexible support.
An example operation is as follows:
dbt run-operation migrate_from_v0_to_v1 --args '{old_database: analytics, old_schema: dbt_artifacts, new_database: analytics, new_schema: artifact_sources}'Thank you to Tails.com for initial development and maintenance of this package. On 2021/12/20, the repository was transferred from the Tails.com GitHub organization to Brooklyn Data Co.
The macros in the early versions package were adapted from code shared by Kevin Chan and Jonathan Talmi of Snaptravel.
Thank you for sharing your work with the community!
- Install pipx
pip install pipx pipx ensurepath- Install tox
pipx install tox-
Copy and paste the
integration_test_project/example-env.shfile and save asenv.sh. Fill in the missing values. -
Source the file in your current shell context with the command:
. ./env.sh. -
From this directory, run
tox -e integration_snowflake # For the Snowflake tests tox -e integration_databricks # For the Databricks tests tox -e integration_bigquery # For the BigQuery tests The Spark tests require installing the ODBC driver. On a Mac, DBT_ENV_SPARK_DRIVER_PATH should be set to /Library/simba/spark/lib/libsparkodbc_sbu.dylib. Spark tests have not yet been added to the integration tests.
We use SQLFluff to keep SQL style consistent. A GitHub action automatically tests pull requests and adds annotations where there are failures. SQLFluff can also be run locally with tox. To install tox, we recommend using pipx.
Install pipx:
pip install pipx pipx ensurepathInstall tox:
pipx install toxLint all models in the /models directory:
toxFix all models in the /models directory:
tox -e fix_allLint (or subsitute lint to fix) a specific model:
tox -e lint -- models/path/to/model.sqlLint (or subsitute lint to fix) a specific directory:
tox -e lint -- models/path/to/directoryEnforced rules are defined within tox.ini. To view the full list of available rules and their configuration, see the SQLFluff documentation.