Asynchronous secondary index queries
This document provides examples of common query patterns for building asynchronous secondary indexes in Bigtable. The examples use IDs and values that are similar to those in Data for examples and you can create a test table to test the queries.
Before you read this page, familiarize yourself with Create an asynchronous secondary index and GoogleSQL for Bigtable.
Example queries
The following examples show how to create asynchronous secondary indexes using different GoogleSQL queries.
Create an inverted index based on a column qualifier and values
The following query creates an inverted index by extracting all key-value pairs from a column family that has a flexible schema. It uses MAP_ENTRIES
to get an array of all data that the stats_summary
column family stores, and then UNNEST
to transform each key-value pair into a separate row. The resulting index allows for fast lookups based on stat_description
and stat_value
. If you add new types of statistics to the source table, then Bigtable includes them in the asynchronous index automatically.
SELECT stats.key as stat_description, stats.value as stat_value, _key as original_key, cell_plan as cell_plan FROM test_table CROSS JOIN UNNEST(MAP_ENTRIES(stats_summary)) stats ORDER BY stat_description, stat_value, original_key
As a result, each key-value pair from the original stats_summary
column family has its own row in the continuous materialized view:
stat_description (key) | stat_value (key) | original_key (key) | cell_plan |
---|---|---|---|
connected_cell | "2" | phone#4c410523#20190501 | { "data_plan_01gb": "true", "data_plan_05gb": "false" } |
os_build | PQ2A.190401.002 | phone#5c10102#20190501 | { "data_plan_10gb": "false" } |
To get fast lookups on data that Bigtable captured as values, use the new row keys to query the data, as shown in the following example:
SELECT cell_plan, original_key FROM `mv_inverted_index` WHERE stat_description = "connected_cell" AND stat_value = "2"
Create a new row key based on existing data
The following example creates a new index key, total_plan_capacity
, by categorizing data plans that the cell_plan
column family stores. The query uses MAP_KEYS
to get all keys in cell_plan
and ARRAY_INCLUDES_ANY
to filter for specific data plan keys. A CASE
statement then defines the logic to assign a total_plan_capacity
category that's based on the presence and values of these data plans. This allows for efficient lookups based on the combined data plan capacity.
SELECT CASE WHEN cell_plan['data_plan_01gb'] = "true" AND (cell_plan['data_plan_05gb'] = "false" OR cell_plan['data_plan_05gb'] IS NULL) AND (cell_plan['data_plan_10gb'] = "false" OR cell_plan['data_plan_10gb'] IS NULL) THEN 'x-small' WHEN cell_plan['data_plan_01gb'] = "true" AND (cell_plan['data_plan_05gb'] = "true") AND (cell_plan['data_plan_10gb'] = "false" OR cell_plan['data_plan_10gb'] IS NULL) THEN 'small' WHEN cell_plan['data_plan_01gb'] = "true" AND (cell_plan['data_plan_05gb'] = "false" OR cell_plan['data_plan_05gb'] IS NULL) AND (cell_plan['data_plan_10gb'] = "true") THEN 'medium' WHEN (cell_plan['data_plan_01gb'] = "false" OR cell_plan['data_plan_01gb'] IS NULL) AND (cell_plan['data_plan_05gb'] = "true") AND (cell_plan['data_plan_10gb'] = "true") THEN 'large' WHEN cell_plan['data_plan_01gb'] = "true" AND (cell_plan['data_plan_05gb'] = "true") AND (cell_plan['data_plan_10gb'] = "true") THEN 'x-large' END as total_plan_capacity, _key as original_key, stats_summary FROM test_table WHERE ARRAY_INCLUDES_ANY(MAP_KEYS(cell_plan), ["data_plan_01gb", "data_plan_05gb","data_plan_10gb"]) ORDER BY total_plan_capacity, original_key
Create a time-series-based index
Bigtable stores each cell value and its associated timestamp. By using the WITH_HISTORY=>TRUE
flag in the FROM
clause, you can retrieve all versions of data and their timestamps. The UNPACK
table function then expands the query results so that each timestamped value appears on its own row with a separate _timestamp
column. This lets you create an asynchronous secondary index where the new row key is based on these timestamp values, enabling quick lookups that are based on time ranges.
SELECT _timestamp as stats_timestamp, _key as original_key, stats_summary FROM UNPACK(( select _key, stats_summary FROM test_table (WITH_HISTORY=>TRUE) )) ORDER BY stats_timestamp, original_key
Create an index entry for items and values in a JSON cell
Bigtable is often used as a key-value store for large-scale workloads, including data in formats such as JSON. The following example shows how to create an asynchronous secondary index for fast lookups of data that JSON stores. The query uses JSON_EXTRACT
to extract the store_id
from a JSON string within the transaction_information
column family in a table named sales
. This index allows for quick retrieval of sales transactions by specific store IDs.
SELECT JSON_EXTRACT(transaction_information["json"], "$.store_id") as store_id, _key as original_key, transaction_information from sales ORDER BY store_id, original_key;