Loop
Loop is an AI assistant in Braintrust playgrounds, experiments, datasets, logs, and the BTQL sandbox.
In playgrounds, it helps you optimize and generate prompts, datasets and evals. On the experiments page, it helps you read and interpret the experiments in a project. In datasets, you can generate and edit datapoint rows at scale. In logs, it helps you find analytical insights about your project. In the BTQL sandbox, it helps you write and debug BTQL queries.
Loop is in public beta and is off by default. To turn it on, flip the feature flag in your settings. If you are on a hybrid deployment, Loop is available starting with v0.0.74
.
Select a model
Loop uses the AI models available in your Braintrust account via the Braintrust API Proxy. We currently support the following models:
- claude-4-sonnet
- claude-4.1-opus
- gpt-5
- gpt-4.1
- o3
- o4-mini
- claude-3-5-sonnet
To choose a model, navigate to the gear icon in the Loop chat window and select from the list of available models.
Available tools
Loop currently has the following tools. Tool availability changes based on the page you are viewing:
- Search docs: Semantically search the Braintrust documentation site to find relevant information
- Get summarized results: Fetch summarized data of current page contents
- Get detailed results: Retrieve detailed data of current page contents (evaluation results, dataset rows, and more)
- Edit prompt: Generate and modify prompts in the playground
- Run eval: Execute evaluations in the playground
- Edit data: Generate and modify datasets
- Get scorers: Get all available scorers in the project
- Edit scorers: Edit scorer selection in the playground
- Create code scorer: Create or edit code-based scorer
- Create LLM judge scorer: Create or edit LLM judge scorer
- BTQL query: Generate and run BTQL query against project logs to conduct analysis
- Run BTQL Generate and run sandbox BTQL query against all data sources
- Get data source Solicit data source selection from user for BTQL query
- Infer schema: Inspect project logs and create an understanding of the shape of the data
- Continue execution: Resume tasks after Loop has run out of iteration
You can remove any of these tools from your Loop workflow by selecting the gear icon and deselecting a tool from the available list.
Generate and optimize prompts
Loop can help you generate a prompt from scratch. To do so, make sure you have an empty task open, then use Loop to generate a prompt.
If you have existing prompts, you can optimize them using Loop.
To optimize a prompt, ask Loop in the chat window, or select the Loop icon in the top bar of any existing task. From there, you can add the prompt to your chat, or quick optimize.
After Loop provides a suggested optimization, you can review and accept the suggestion or keep iterating.
Generate and optimize datasets
If no dataset exists, Loop can create one automatically. You must have a task in order for Loop to generate a tailored dataset for the evaluation task.
You can review the dataset and further refine it as needed.
After you run your playground, you can also ask Loop to optimize your dataset. The agent will provide various areas for optimizations based on an analysis of your current dataset.
Loop can also modify datasets to a specific shape you define, and generate synthetic datasets based on existing patterns from your playgrounds, logs, experiments, and datasets.
Analyze project logs
Loop can understand the shape of your project's logs data and make arbitrary queries to answer questions about your logs data. This ability can be used to find analytical insights or used in conjunction with Loop's other abilities.
For analytical insights, you can ask things like "what are the most common errors", "what are the most common inputs from users", and "what user retention trends do you see?" and Loop will gather the necessary data from your logs to answer your question.
For using this in conjunction with Loop's other abilities, you might navigate to the dataset page and ask Loop, "Can you find the most common errors users face and generate dataset rows based on the findings? Follow the formatting of existing rows you see in this dataset", and Loop will gather the context necessary from logs and generate your dataset based on the findings.
Write and debug BTQL queries
In the BTQL sandbox, Loop can:
- Generate BTQL queries from natural language descriptions
- Fix syntax, binder, and runtime errors
- Explain query results and suggest follow-up analyses
Specify a data source
When asking Loop to write or modify BTQL queries, you can specify the data source in several ways:
Explicitly specify entity type and ID
Let Loop prompt you for a data source
If you don't specify a data source, Loop will ask you to select one from the available options in your workspace.
Reference the current query's data source
When you have an existing query in the sandbox, you can refer to it implicitly.
Loop understands the context of your current query and will try to use the same data source unless you specify otherwise.
Write queries from scratch
Loop can create BTQL queries based on your natural language requests. Describe what data you want to analyze, and Loop will generate the appropriate query.
Modifying queries
Loop can rewrite existing queries to better match your analytical needs:
Debug and fix errors
Loop can help you resolve various types of errors that occur when writing and running BTQL queries.
Parser errors
These occur when BTQL can't parse your query due to syntax issues:
- Missing quotes around string literals
- Unmatched parentheses or brackets
- Invalid operators or keywords
- Malformed expressions
Parser errors appear as you type and provide specific feedback about invalid syntax. Hovering on the red underline will show a popup with the error and a Fix with Loop button.
Binder errors
These occur during validation when BTQL checks your query against the data schema:
- References to non-existent fields (for example,
metadata.nonexistent_field
) - Type mismatches in comparisons
- Invalid field access patterns
Binder errors appear as you type and provide specific feedback about which fields or operations are invalid. Hovering on the red underline will show a popup with the error and a Fix with Loop button.
Runtime errors
These occur when executing your query against the actual data:
- Data source not found or inaccessible
- Query timeout due to complexity or data size
- Permission or access control issues
- Database connection problems
Runtime errors are displayed in the results panel after you run a query, along with a Fix with Loop button.
Loop analyzes the specific error type and context to provide targeted fixes, whether it's correcting syntax, suggesting the right field names, or helping optimize query performance.
Search documentation information
If you need help using Braintrust or understanding concepts, Loop will semantically search through the documentation to provide the answer to your questions.
Generate and edit scorers
If no scorers exist, Loop can create one for you. You must have a dataset and a task in order for Loop to generate a scorer that is specific to your use case. The agent will begin by checking what data you have, what existing scorers are available, and fetching some sample results to understand the data structure.
If you select Accept, the new scorer will be added to the playground.
Loop can also help you improve and edit existing scorers.
You can create or edit scorers from experiment, dataset, or logs pages, and Loop will gather context from the resources on the page.
Tune scorers based on target classification
Loop can take manually labelled target classification from evaluations in the playground and adjust scorer classification behavior.
Select the rows that the scorers did not perform expectedly on, then select Tune scorer.
Select the desired classification, provide optional additional instruction and submit to Loop to tune the scorer. Loop will adjust the scorer based on the provided context.
Run and assess evals
After your tasks, dataset, and scorers are set up, Loop can run an evaluation for you, analyze it, and suggest further improvements.
Analyze and interpret your experiments
Loop can read the results of your experiment(s), summarize the results, and help discover new insights.
Settings
By default, Loop will ask you for confirmation before executing certain tool calls, like running an evaluation. If you'd like Loop to freely create and edit resources, and run evaluations, turn on auto-accept in the Settings dropdown menu.
Model allowlist
On the Settings page, administrators can customize which models are available to be used in Loop for the organization.