Skip to content

Commit 0d40243

Browse files
committed
Merge remote-tracking branch 'origin/main' into add-model-pricing
2 parents ad14d0d + 1ab220b commit 0d40243

File tree

7 files changed

+1007
-383
lines changed

7 files changed

+1007
-383
lines changed

.env.example

Lines changed: 1 addition & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1 @@
1-
# AI Model Configuration
2-
# Format: "anthropic/model-name", "openai/model-name", or "openrouter/provider/model-name"
3-
MODEL=anthropic/claude-haiku-4-5
4-
5-
# API Keys (only one is required based on your MODEL choice)
6-
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
7-
OPENAI_API_KEY=sk-your-key-here
8-
OPENROUTER_API_KEY=sk-or-v1-your-key-here
9-
10-
# MCP Server Configuration (optional)
11-
# Leave empty to disable MCP integration
12-
13-
# For HTTP MCP servers (use full URL):
14-
MCP_SERVER_URL=https://mcp.svelte.dev/mcp
15-
16-
17-
# For local stdio MCP servers (use command string):
18-
# MCP_SERVER_URL=npx -y @sveltejs/mcp
19-
20-
# To disable MCP, set mcp to empty string
21-
# MCP_SERVER_URL=
22-
23-
# To disable the test component tool, uncomment the line below
24-
#DISABLE_TESTCOMPONENT_TOOL=1
25-
26-
VERBOSE_LOGGING=true
1+
VERCEL_OIDC_TOKEN="create with vercel cli"

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,5 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
4040
results/*
4141
!results/.gitkeep
4242
!results/*.json
43+
.vercel
44+
.env*.local

README.md

Lines changed: 40 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ai-sdk-bench
22

3-
AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration. Automatically discovers and runs all tests in the `tests/` directory, verifying LLM-generated Svelte components against test suites.
3+
AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration using the Vercel AI Gateway. Automatically discovers and runs all tests in the `tests/` directory, verifying LLM-generated Svelte components against test suites.
44

55
## Installation
66

@@ -12,84 +12,58 @@ bun install
1212

1313
## Setup
1414

15-
To set up `.env`:
15+
Configure your API keys in `.env`:
1616

17-
```bash
18-
cp .env.example .env
19-
```
20-
21-
Then configure your API keys and model in `.env`:
22-
23-
```bash
24-
# Required: Choose your model
25-
MODEL=anthropic/claude-sonnet-4
26-
ANTHROPIC_API_KEY=your_key_here
27-
28-
# Optional: Enable MCP integration (leave empty to disable)
29-
MCP_SERVER_URL=https://mcp.svelte.dev/mcp
30-
```
31-
32-
### Environment Variables
33-
34-
**Required:**
35-
36-
- `MODEL`: The AI model to use (e.g., `anthropic/claude-sonnet-4`, `openai/gpt-5`, `openrouter/anthropic/claude-sonnet-4`, `lmstudio/model-name`)
37-
- Corresponding API key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `OPENROUTER_API_KEY`)
38-
- Note: No API key required for `lmstudio/*` models (runs locally)
39-
40-
**Optional:**
17+
1. Install Vercel CLI if you haven't already
18+
2. Run `bun run vercel:link` and link the benchmark to a project that has AI Gateway enabled
19+
3. Run the benchmark with "bun run dev"
4120

42-
- `MCP_SERVER_URL`: MCP server URL (leave empty to disable MCP integration)
21+
### Required API Keys
4322

44-
### Supported Providers
23+
You'll need at least one API key for the providers you want to test:
4524

46-
**Cloud Providers:**
25+
- `VERCEL_OIDC_TOKEN`: The OIDC token for vercel AI gateway
4726

48-
- `anthropic/*` - Direct Anthropic API (requires `ANTHROPIC_API_KEY`)
49-
- `openai/*` - Direct OpenAI API (requires `OPENAI_API_KEY`)
50-
- `openrouter/*` - OpenRouter unified API (requires `OPENROUTER_API_KEY`)
51-
52-
**Local Providers:**
53-
54-
- `lmstudio/*` - LM Studio local server (requires LM Studio running on `http://localhost:1234`)
27+
## Usage
5528

56-
Example configurations:
29+
To run the benchmark:
5730

5831
```bash
59-
# Anthropic
60-
MODEL=anthropic/claude-sonnet-4
61-
ANTHROPIC_API_KEY=sk-ant-...
32+
bun run index.ts
33+
```
6234

63-
# OpenAI
64-
MODEL=openai/gpt-5
65-
OPENAI_API_KEY=sk-...
35+
### Interactive CLI
6636

67-
# OpenRouter
68-
MODEL=openrouter/anthropic/claude-sonnet-4
69-
OPENROUTER_API_KEY=sk-or-...
37+
The benchmark features an interactive CLI that will prompt you for configuration:
7038

71-
# LM Studio (local)
72-
MODEL=lmstudio/llama-3-8b
73-
# No API key needed - make sure LM Studio is running!
74-
```
39+
1. **Model Selection**: Choose one or more models from the Vercel AI Gateway
40+
- Select from available models in your configured providers
41+
- Optionally add custom model IDs
42+
- Can test multiple models in a single run
7543

76-
## Usage
44+
2. **MCP Integration**: Choose your MCP configuration
45+
- **No MCP Integration**: Run without external tools
46+
- **MCP over HTTP**: Use HTTP-based MCP server (default: `https://mcp.svelte.dev/mcp`)
47+
- **MCP over StdIO**: Use local MCP server via command (default: `npx -y @sveltejs/mcp`)
48+
- Option to provide custom MCP server URL or command
7749

78-
To run the benchmark (automatically discovers and runs all tests):
50+
3. **TestComponent Tool**: Enable/disable the testing tool for models
51+
- Allows models to run tests during component development
52+
- Enabled by default
7953

80-
```bash
81-
bun run index.ts
82-
```
54+
### Benchmark Workflow
8355

84-
The benchmark will:
56+
After configuration, the benchmark will:
8557

8658
1. Discover all tests in `tests/` directory
87-
2. For each test:
59+
2. For each selected model and test:
8860
- Run the AI agent with the test's prompt
8961
- Extract the generated Svelte component
9062
- Verify the component against the test suite
9163
3. Generate a combined report with all results
9264

65+
### Results and Reports
66+
9367
Results are saved to the `results/` directory with timestamped filenames:
9468

9569
- `results/result-2024-12-07-14-30-45.json` - Full execution trace with all test results
@@ -148,12 +122,17 @@ This copies each `Reference.svelte` to `Component.svelte` temporarily and runs t
148122

149123
## MCP Integration
150124

151-
The tool supports optional integration with MCP (Model Context Protocol) servers:
125+
The tool supports optional integration with MCP (Model Context Protocol) servers through the interactive CLI. When running the benchmark, you'll be prompted to choose:
152126

153-
- **Enabled**: Set `MCP_SERVER_URL` to a valid MCP server URL
154-
- **Disabled**: Leave `MCP_SERVER_URL` empty or unset
127+
- **No MCP Integration**: Run without external tools
128+
- **MCP over HTTP**: Connect to an HTTP-based MCP server
129+
- Default: `https://mcp.svelte.dev/mcp`
130+
- Option to provide a custom URL
131+
- **MCP over StdIO**: Connect to a local MCP server via command
132+
- Default: `npx -y @sveltejs/mcp`
133+
- Option to provide a custom command
155134

156-
MCP status is documented in both the JSON metadata and displayed as a badge in the HTML report.
135+
MCP status, transport type, and server configuration are documented in both the JSON metadata and displayed as a badge in the HTML report.
157136

158137
## Exit Codes
159138

0 commit comments

Comments
 (0)