Skip to content

Commit adb427b

Browse files
committed
refactor(utils): improve upsert (#13)
**Features** / **Fixes** - docs: table references must be fully qualified. - Jinja `context` was common to the two templates. Now defined only once. **Deprecations** / **Breaking Changes** - `delta` is changed to `upsert` mode. `upsert` function will be renamed in `merge`. Issue #13
1 parent f53ce53 commit adb427b

File tree

1 file changed

+24
-27
lines changed

1 file changed

+24
-27
lines changed

bigfunctions/upsert.yaml

Lines changed: 24 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ description: |-
1414
1515
| Param | Possible values |
1616
|---|---|
17-
| `query_or_table_or_view` | Can be a fully qualified table or view `(<project-id>.)?<dataset_id>.<table_or_view_name>`. <br> Can also be a plain query in BigQuery Standard SQL. |
18-
| `destination_table` | Must be a fully qualified table `(<project-id>.)?<dataset_id>.<table_or_view_name>`. |
19-
| `insertion_mode` | Three insertion mode are available:<ul><li> `"insert_only"`: existing records in `query_or_table_or_view` and not existing in `destination_table` are inserted. Deletion and update are not possible. </li><li> `"delta"`: same as `insert_only` with the updatable records. Records existing both in `query_or_table_or_view` and in `destination_table` are updated. If `recency_field` is filled, only the most recent version from source and destination is kept. </li><li> `"full"`: same as `delta` with the deletable records. Records not existing in `query_or_table_or_view` and existing in `destination_table` are deleted. </li> </ul> |
17+
| `query_or_table_or_view` | Can be a fully qualified table or view `<project-id>.<dataset_id>.<table_or_view_name>`. <br> Can also be a plain query in BigQuery Standard SQL. |
18+
| `destination_table` | Must be a fully qualified table `<project-id>.<dataset_id>.<table_or_view_name>`. |
19+
| `insertion_mode` | Three insertion mode are available:<ul><li> `"insert_only"`: existing records in `query_or_table_or_view` and not existing in `destination_table` are inserted. Deletion and update are not possible. </li><li> `"upsert"`: same as `insert_only` with the updatable records. Records existing both in `query_or_table_or_view` and in `destination_table` are updated. If `recency_field` is filled, only the most recent version from source and destination is kept. </li><li> `"full"`: same as `upsert` with the deletable records. Records not existing in `query_or_table_or_view` and existing in `destination_table` are deleted. </li> </ul> |
2020
| `primary_keys` | Combination of field identifying a record. If `primary_keys = []`, every row will be considered as a unique record. |
2121
| `recency_field` | Orderable field (ie. `timestamp`, `integer`, ...) to identify the relative frechness of a record version. |
2222
arguments:
@@ -31,18 +31,18 @@ arguments:
3131
- name: recency_field
3232
type: string
3333
examples:
34-
- description: "Merge tables in delta mode"
34+
- description: "Merge tables in upsert mode"
3535
arguments:
36-
- "'dataset_id.source_table_or_view'"
37-
- "'dataset_id.destination_table'"
38-
- "'delta'"
36+
- "'project-id.dataset_id.source_table_or_view'"
37+
- "'project-id.dataset_id.destination_table'"
38+
- "'upsert'"
3939
- "['id']"
4040
- "'timestamp_field'"
4141
region: ALL
4242
- description: "Merge from query in full"
4343
arguments:
44-
- "'select * from dataset_id.source_table_or_view where filter_field = true'"
45-
- "'dataset_id.destination_table'"
44+
- "'select * from project-id.dataset_id.source_table_or_view where filter_field = true'"
45+
- "'project-id.dataset_id.destination_table'"
4646
- "'full'"
4747
- "['id']"
4848
- "null"
@@ -52,7 +52,21 @@ code: |
5252
declare context json;
5353
declare table_columns array<string>;
5454
55-
assert lower(insertion_mode) in ('insert_only', 'delta', 'full') AS '`insertion_mode` must be either "insert_only", "delta", or "full"';
55+
assert lower(insertion_mode) in ('insert_only', 'upsert', 'full') AS '`insertion_mode` must be either "insert_only", "upsert", or "full"';
56+
57+
set context = to_json(struct(
58+
if(
59+
-- if table then create a query from its name.
60+
regexp_contains(replace(trim(query_or_table_or_view), '`', ''), r'^(([a-zA-Z0-9\-]+)\.)?([a-zA-Z0-9_]+)\.([a-zA-Z0-9_]+)$'),
61+
'select * from ' || query_or_table_or_view,
62+
query_or_table_or_view
63+
) as query_or_table_or_view,
64+
destination_table as destination_table,
65+
insertion_mode as insertion_mode,
66+
primary_keys as primary_keys,
67+
recency_field as recency_field,
68+
table_columns as table_columns
69+
));
5670
5771
/*
5872
Get destination table columns to define the insert and update parts of the merge query.
@@ -85,10 +99,6 @@ code: |
8599
'''
86100
;
87101
88-
set context = to_json(struct(
89-
destination_table as destination_table
90-
));
91-
92102
execute immediate {BIGFUNCTIONS_DATASET}.render_string(query, to_json_string(context)) into table_columns;
93103
94104
/*
@@ -155,17 +165,4 @@ code: |
155165
'''
156166
;
157167
158-
set context = to_json(struct(
159-
if(
160-
regexp_contains(replace(trim(query_or_table_or_view), '`', ''), r'^(([a-zA-Z0-9\-]+)\.)?([a-zA-Z0-9_]+)\.([a-zA-Z0-9_]+)$'),
161-
'select * from ' || query_or_table_or_view,
162-
query_or_table_or_view
163-
) as query_or_table_or_view,
164-
destination_table as destination_table,
165-
insertion_mode as insertion_mode,
166-
primary_keys as primary_keys,
167-
recency_field as recency_field,
168-
table_columns as table_columns
169-
));
170-
171168
execute immediate {BIGFUNCTIONS_DATASET}.render_string(query, to_json_string(context));

0 commit comments

Comments
 (0)