Braintrust Query Language (BTQL)
Braintrust Query Language (BTQL) is a precise, SQL-like syntax for querying Braintrust experiments, logs, and datasets. Use BTQL to better analyze and understand your data.
Why use BTQL?
BTQL gives you precise control over your AI application data. You can:
- Filter and search for relevant logs and experiments
- Create consistent, reusable queries for monitoring
- Build automated reporting and analysis pipelines
- Write complex queries to analyze model performance
BTQL in Braintrust
Use BTQL when filtering logs and experiments, in the BTQL sandbox, and programmatically through the Braintrust API.
Filter logs and experiments
Use BTQL to filter logs and experiments based on specific criteria. You can filter logs by tags, metadata, or any other relevant fields. Filtering in logs and experiments only supports filter
clauses.
At the top of your experiment or log view, click Filter to open the filter editor and click BTQL to switch to BTQL mode.
BTQL sandbox
To test BTQL with autocomplete, validation, and a table of results, use the BTQL sandbox in the dashboard. In your project, click BTQL sandbox at the bottom of the sidebar.
API access
Access BTQL programmatically with the Braintrust API:
The API accepts these parameters:
query
(required): your BTQL query stringfmt
: response format (json
orparquet
, defaults tojson
)tz_offset
: timezone offset in minutes for time-based operationsaudit_log
: include audit log data
For correct day boundaries, set tz_offset
to match your timezone. For example, use 480
for US Pacific Standard Time.
Query structure
BTQL queries follow a familiar SQL-like structure that lets you define what data you want, how you want it returned, and how to analyze it.
This example returns every log from a project where Factuality is greater than 0.8, sorts by created date descending, and limits the results to 100.
select
: choose which fields to retrievefrom
: specify the data source - can be an identifier (likeproject_logs
) or a function call (likeexperiment("id")
). Has an optional designator for the shape of the data:spans
,traces
,summary
. If not specified, defaults tospans
.filter
: define conditions to filter the datasample
: randomly sample a subset of the filtered data (rate or count-based)sort
: set the order of results (asc
ordesc
)limit
andcursor
: control result size and enable pagination
Retrieve records
When retrieving records with BTQL, you can either use select
or dimensions
and measures
. You can use most tools when using either method, but you must use dimensions
and measures
if you want to aggregate functions to retrieve results.
select
select
in BTQL is identical to the select
clause in SQL. You can select specific fields, compute values, or use *
to retrieve every field.
BTQL allows you to transform data directly in the select
clause. This query returns metadata.model
, whether metrics.tokens
is greater than 1000, and a quality indicator of either "high" or "low" depending on whether or not the Factuality score is greater than 0.8.
You can also use functions in the select
clause to transform values and create meaningful aliases for your results. This query extracts the day the log was created, the hour, and a Factuality score rounded to 2 decimal places.
dimensions
and measures
Instead of select
, you can use dimensions
and measures
to group and aggregate data. This query returns a row for each distinct model with the day it was created, the total number of calls, the average Factuality score, and the latency percentile.
The available aggregate functions are:
count(expr)
: number of rowssum(expr)
: sum of numeric valuesavg(expr)
: mean (average) of numeric valuesmin(expr)
: minimum valuemax(expr)
: maximum valuepercentile(expr, p)
: a percentile wherep
is between 0 and 1
from
The from
clause identifies where the records are coming from. This can be an identifier like project_logs
or a function call like experiment("id")
.
Data source shapes
You can add an optional parameter to the from
clause that defines how the data is returned. The options are spans
(default), traces
, and summary
.
spans
spans
returns individual spans.
traces
traces
returns all spans from traces that contain at least one matching span.
summary
summary
provides a high-level overview of your experiment results by aggregating metrics across all spans in a trace. It returns one row per trace, making it ideal for analyzing overall performance and comparing results across experiments.
summary
includes aggregated metrics like total tokens, total cost, and average scores, while also providing previews of the input
, output
, expected
, and metadata
fields. This allows faster loading of data and easier comparison between experiments.
The preview fields (input, output, expected, error) show only data from the root span and are truncated to a default length of 124 characters, although you can change the length with the preview_length
clause. Set preview_length: -1
to disable truncation entirely.
When using summary
, scores are averaged across all spans in the trace, and metrics like tokens and costs are summed. The duration metrics are calculated from the span timings.
filter
The filter
clause lets you specify conditions to narrow down results. Similar to the WHERE
clause in SQL, it supports a wide range of operators and functions, including complex conditions.
This example filter
only retrieves data where:
- Factuality score is greater than 0.8
- model is "gpt-4"
- tag list includes "triage"
- [text search something]
- created date is later than January 1, 2024
- more than 1000 tokens were used or the [query? log? trace? span?] was made in production
Note: Negative filters on tags (e.g., NOT tags includes "resolved"
) may not work as expected. Since tags are only applied to the root span of a trace, and queries return complete traces, negative tag filters will match child spans (which don't have tags) and return the entire trace. We recommend using positive tag filters instead.
Time intervals
BTQL supports intervals for time-based operations.
This query returns all project logs from 'my-project-id' that were created in the last day.
This query returns all project logs from 'my-project-id' that were created up to an hour ago.
This query returns all project logs from 'my-project-id' that were created last week and last month.
sort
The sort
clause determines the order of results. The options are desc
(descending) and asc
(ascending) on a numerical field. You can sort by a single field, multiple fields, or computed values.
pivot
and unpivot
pivot
and unpivot
are advanced clauses that transform your results for easier analysis and comparison.
pivot
The pivot
clause transforms your results to make comparisons easier by converting rows into columns. This is useful when comparing metrics across different categories or time periods.
Syntax:
This query returns a record for each model with Factuality score and latency percentile across time periods.
This query returns a record for each tag and aggregates the number of instances of that tag per model.
Pivot columns are automatically named by combining the dimension value and measure name. For example, if you pivot by metadata.model
and a model named "gpt-4" to measure avg_score
, the name becomes gpt-4_avg_score
.
unpivot
The unpivot
clause transforms columns into rows, which is useful when you need to analyze arbitrary scores and metrics without specifying each score name. This is helpful when working with dynamic sets of metrics or when you need to know all possible score names in advance.
limit
and cursor
limit
The limit
clause controls the size of the result in number of records.
cursor
The cursor
clause implements pagination. Cursors are automatically returned in BTQL responses. A default limit is applied in a query without a limit clause, and the number of returned results can be overridden by using an explicit limit
. In order to implement pagination, after an initial query, provide the subsequent cursor token returned in the results in the cursor
clause in follow-on queries. When a cursor has reached the end of the result set, the data
array will be empty, and no cursor token will be returned by the query.
Cursors can only be used for pagination when no sort
clause is specified. If you need sorted results, you'll need to implement offset-based pagination by using the last value from your sort field as a filter in the next query.
Expressions
BTQL operators
You can use the following operators in your BTQL queries.
BTQL functions
You can use the following functions in select
, filter
, dimensions
, and measures
clauses.
Field access
BTQL provides flexible ways to access nested data in arrays and objects:
Array indices are 0-based, and negative indices count from the end (-1 is the last element).
Conditional expressions
BTQL supports conditional logic using the ternary operator (? :
):
Examples
Track token usage
This query helps you monitor token consumption across your application.
The response shows daily token usage:
Monitor model quality
Track model performance across different versions and configurations.
Analyze errors
Identify and investigate errors in your application.
Analyze latency
Monitor and optimize response times.
Analyze prompts
Analyze prompt effectiveness and patterns.
Analyze based on tags
Use tags to track and analyze specific behaviors.