GoogleSQL for Bigtable query examples

The examples on this page demonstrate SQL query patterns for common and advanced Bigtable queries. You can run GoogleSQL queries in the Bigtable Studio query editor. You can also run queries using the Bigtable client library for Java.

Before you read this page, read the GoogleSQL for Bigtable overview.

The examples on this page use IDs and values similar to those in Data for examples.

Common Bigtable SQL query patterns

The following are examples of common queries for Bigtable data. To see examples of similar queries that call the Bigtable Data API, see Read examples and Use filters. For examples of queries on structured row keys, see Structured row key queries.

Retrieve the latest version of all columns for a given row key.

 SELECT * FROM myTable WHERE _key = 'r1' 

Retrieve all versions of all columns for a given row key.

 SELECT * FROM myTable(with_history => TRUE) WHERE _key = 'r1' 

Retrieve the latest version of a particular column from a particular column family for a given row key.

 SELECT stats_summary['os_build'] AS os  FROM analytics  WHERE _key = 'phone#4c410523#20190501' 

Retrieve the row keys and the latest version of multiple columns for a given row key range.

 SELECT  _key,  stats_summary['os_build'] AS os,  stats_summary['user_agent'] AS agent  FROM analytics  WHERE _key >= 'phone#4c410523#20190501' AND _key < 'phone#4c410523#201906201' 

Retrieve all versions of all columns for multiple row key ranges, up to 10 rows.

 SELECT *  FROM analytics(with_history => TRUE)  WHERE  (_key >= 'phone#4c410523#20190501' AND _key < 'phone#4c410523#201906201')  OR (_key >= 'phone#5c10102#20190501' AND _key < 'phone#5c10102#20190601')  LIMIT 10 

Retrieve all versions of all columns for multiple row keys.

 SELECT *  FROM analytics(with_history => TRUE)  WHERE _key = 'phone#4c410523#20190501' OR _key = 'phone#4c410523#20190502' 

Retrieve all versions of all columns for multiple row keys using a different approach.

 SELECT *  FROM analytics(with_history => TRUE)  WHERE _key IS IN ('phone#4c410523#20190501', 'phone#4c410523#20190502') 

Retrieve the latest version of all columns within a column family for a row key prefix.

 SELECT stats_summary  FROM analytics  WHERE _key LIKE 'phone#%' 

Retrieve the row keys and three latest versions of all columns within a column family for all rows in the table. This query requires a full table scan, so it's not recommended for low-latency, high-throughput access patterns.

 SELECT _key, cell_plan FROM analytics(with_history => TRUE, latest_n => 3) 

Retrieve the latest version of all columns with row keys matching a specified regular expression. This query requires a full table scan, so it's not recommended for low-latency, high-throughput access patterns, unless you also provide a row key prefix or row key range predicate in the WHERE clause.

 SELECT *  FROM myTable(with_history => TRUE)  WHERE REGEXP_CONTAINS(_key, '.*#20190501$') 

Retrieve the latest version of all columns with the matching row key prefix and counter value more than 123. You don't need to cast for this comparison, because Bigtable aggregates are numeric.

 SELECT *  FROM myTable  WHERE _key LIKE 'user12%' AND counterFamily['counter'] > 123 

Retrieve the latest version of all columns for a row key prefix if the referrer matches a specific value.

 SELECT *  FROM analytics  WHERE _key LIKE 'com.mysite%' AND session['referrer'] = './home' 

Categorize a given row based on the value of a given column. This query is similar to using a composing conditional filter in the Bigtable Data API.

 SELECT  *,  CASE cell_plan['data_plan']  WHEN '10gb' THEN 'passed-filter'  ELSE 'filtered-out'  END  AS label  FROM analytics 

Retrieve the row key and the column qualifiers in a specific column family for a specified row key range. In SQL, column families are represented by the map data type, where each column qualifier and value is mapped as a key-value pair. This SQL query is similar to using a strip value filter in the Bigtable Data API.

 SELECT _key, MAP_KEYS(cell_plan) AS keys  FROM analytics  WHERE _key >= 'phone#4c410523#20190501' AND _key < 'phone#4c410523#201906201' 

The UNPACK function lets you transform Bigtable data into a tabular time series format, which is useful when performing time series analysis. Consider an example where you have a clicks column in an engagement column family. The following query uses UNPACK to see how certain campaigns are performing by aggregating the clicks over a minute from the last hour.

 SELECT  FORMAT_TIMESTAMP('%M', _timestamp) AS minute,  COUNT(clicks) AS total_clicks  FROM  UNPACK((  SELECT engagement['clicks'] as clicks  FROM metrics(with_history => true, after => TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR))  WHERE _key = @campaign_id  ))  GROUP BY  minute; 

Advanced Bigtable SQL query patterns

The following samples demonstrate more advanced patterns.

With the following query, you can retrieve the row key and most recent value of the JSON attribute abc in the session column family. For more information, see JSON functions.

 SELECT _key, JSON_VALUE(session['payload'], '$.abc') AS abc FROM analytics 

With the following query, you can retrieve the row key and calculate the average session length using the latest value of two Bigtable aggregate cells , which are numeric, for each row in the table.

 SELECT  _key AS userid,  session['total_minutes'] / session['count'] AS avg_session_length  FROM analytics 

With the following query, you can retrieve the latest version of all columns for a given row key prefix if the session column family contains referrer, origin, or server as a column qualifier. Alternatively, this query can also be written as a series of individual comparisons such as session['referrer'] IS NOT NULL OR session['origin'] IS NOT NULL. However, for queries involving a large number of comparisons, the following approach is recommended.

 SELECT *  FROM analytics  WHERE  _key LIKE 'com.abc%'  AND ARRAY_INCLUDES_ANY(MAP_KEYS(session), ['referrer', 'origin', 'server']) 

With the following query, you can retrieve the latest version of all columns for a given row key prefix if the session column family containsreferrer, origin, and server as column qualifiers. Alternatively, this query can be written as a series of individual comparisons such as session['referrer'] IS NOT NULL AND session ['origin'] IS NOT NULL.

 SELECT *  FROM analytics  WHERE  _key LIKE 'com.abc%'  AND ARRAY_INCLUDES_ALL(MAP_KEYS(session), ['referrer', 'origin', 'server']) 

With the following query, you can retrieve the latest version of all columns for a given row key prefix if the session column family contains com.google.search, com.google.maps, or com.google.shopping as values.

 SELECT *  FROM analytics  WHERE  _key LIKE 'com.abc%'  AND ARRAY_INCLUDES_ANY(  MAP_VALUES(session),  ['com.google.search', 'com.google.maps', 'com.google.shopping']) 

With the following query, you can retrieve the latest version of all columns if key-value pairs in the cell_plan column family include both data_plan:unlimited and roaming:North America.

 SELECT *  FROM analytics  WHERE  ARRAY_INCLUDES_ALL(  CAST(  MAP_ENTRIES(cell_plan)  AS ARRAY<STRUCT<key STRING, value STRING>>),  [('data_plan', 'unlimited'), ('roaming', 'North America')]) 

With the following query, you can retrieve the row key and temperature readings for weather sensors for cases where the temperature exceeded 70 degrees during the last seven measurements.

 SELECT  _key AS sensorid,  ARRAY_FILTER(  CAST(  sensor['temperature']  AS ARRAY<STRUCT<timestamp TIMESTAMP, value STRING>>),  e -> CAST(e.value AS FLOAT32) > 70) AS high_temperature  FROM weather(with_history => TRUE, latest_n => 7) 

In temporal filtering order, latest_n comes last, so a query like after => X, before => y, latest_n => 3 returns the latest three values that satisfy the after and before conditions. If your use case requires latest_n to take precedence, you can provide latest_n as the only temporal filter, and then apply the rest of the temporal filters using query operators in your SELECT statement, as shown in the example. For more information, see Temporal filters.

 SELECT  ARRAY_FILTER(  CAST(  address['street']  AS ARRAY<STRUCT<timestamp TIMESTAMP, value STRING>>),  e -> e.timestamp > TIMESTAMP('2021-01-04T23:51:00.000Z'))  AS street_address  FROM locations(with_history => TRUE, latest_n => 3) 

Similar to the previous example, you can apply a different temporal filter to each column family in your query. For example, the following query returns the three most recent versions of the street column and the two least recent versions of the state column.

 SELECT  ARRAY_FILTER(  CAST(  address['street']  AS ARRAY<STRUCT<timestamp TIMESTAMP, value STRING>>),  (e, i) -> i <= 2)  AS street_address,  ARRAY_FILTER(  ARRAY_REVERSE(  CAST(  address['state']  AS ARRAY<STRUCT<timestamp TIMESTAMP, value STRING>>)),  (e, i) -> i <= 1)  AS state  FROM locations(with_history => TRUE) 

With the following query, you can retrieve all versions of all columns if key-value pairs in the address column family include both city:Savannah or city:Nashville at any point in time.

 SELECT *  FROM locations(with_history => TRUE)  WHERE  ARRAY_LENGTH(  ARRAY_FILTER(  CAST(  MAP_ENTRIES(address)  AS ARRAY<  STRUCT<  key STRING,  value ARRAY<STRUCT<timestamp TIMESTAMP, value STRING>>>>),  e ->  e.key = 'city'  AND ARRAY_INCLUDES_ANY(  ARRAY_TRANSFORM(e.value, k -> k.value), ['Savannah', 'Nashville'])))  > 0 

In this particular example, casting is not required, so this can also be written in the following shorter form.

 SELECT *  FROM locations(with_history => TRUE)  WHERE  ARRAY_LENGTH(  ARRAY_FILTER(  MAP_ENTRIES(address),  e ->  e.key = 'city'  AND ARRAY_INCLUDES_ANY(  ARRAY_TRANSFORM(e.value, k -> k.value), ['Savannah', 'Nashville'])))  > 0 

What's next