Query in BigQuery
This guide explains how to query data in BigQuery for typical Manufacturing Data Engine (MDE) use-cases.
Records join with cloud metadata
If cloud metadata materialization is disabled, you can access cloud metadata instances by joining relevant record table with the metadata-store on the metadata instance_id with the following SQL query:
SELECT dnr.*, ms.instance FROM mde_data.`RECORD_TABLE_NAME` AS dnr LEFT JOIN mde_dimension.`metadata-store` AS ms ON ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.BUCKET_NAME.instance_id") WHERE DATE(event_timestamp) = 'EVENT_TIMESTAMP' LIMIT 100 Replace the following:
RECORD_TABLE_NAME: Name of the record table.BUCKET_NAME: Name of the cloud metadata bucket.EVENT_TIMESTAMP: Timestamp of the event.
To improve query performance and since the metadata-store is partitioned on bucket number, you can optionally specify the bucket number in the ON clause, as the following SQL query:
SELECT dnr.*, ms.instance FROM mde_data.`<RECORD_TABLE_NAME>` AS dnr LEFT JOIN mde_dimension.`metadata-store` AS ms ON ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.BUCKET_NAME.instance_id") AND ms.bucket_number = <BUCKET_NUMBER> WHERE DATE(event_timestamp) = 'EVENT_TIMESTAMP' LIMIT 100 Replace the following:
BUCKET_NAME: Name of the cloud metadata bucket.EVENT_TIMESTAMP: Timestamp of the event.
Cloud metadata instance attributes access
You can access metadata instance attributes using the JSON dot notation which always returns a JSON object, or using one of the BigQuery JSON functions, such as JSON_VALUE to extract strings or other data types. See the following example:
SELECT dnr.*, ms.instance.deviceName -- this returns a double quoted JSON string JSON_VALUE(ms.instance, '$.deviceName') -- this returns a string FROM mde_data.`example-record-tbl` AS dnr LEFT JOIN mde_dimension.`metadata-store` AS ms ON ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.bucket.instance_id") WHERE DATE(event_timestamp) = '2023-01-01' LIMIT 100 Similarly, if cloud metadata materialization is enabled, you can access metadata instance attributes directly from the record. See the following example:
SELECT * (EXCEPT materialized_cloud_metadata), materialized_cloud_metadata.device.deviceName -- this returns a double quoted JSON string JSON_VALUE(materialized_cloud_metadata., '$.device.deviceName') -- this returns a string FROM mde_data.`example-record-tbl` WHERE DATE(event_timestamp) = '2023-01-01' LIMIT 100 Obtaining a list of all instance IDs contained in cloud_metadata_ref
To obtain an array of all metadata instance IDs contained in the cloud_metadata_ref field of a record follow these steps:
Create the user defined function (UDF) with the following SQL query:
CREATE OR REPLACE FUNCTION `mde_data.get_instance_ids`(input JSON) RETURNS ARRAY<STRING> LANGUAGE js AS R""" return input ? Object.keys(input).map(bucketName => input[bucketName].instance_id).filter(instance_id => instance_id != null) : []; """;Execute the function in a query:
SELECT mde_data.get_instance_ids(cloud_metadata_ref) as metadata_instance_ids, *, FROM mde_data.`RECORD_TABLE_NAME` WHERE DATE(event_timestamp) = 'EVENT_TIMESTAMP' LIMIT 100Replace the following:
RECORD_TABLE_NAME: Name of the record table.EVENT_TIMESTAMP: Timestamp of the event.