You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40-61Lines changed: 40 additions & 61 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# ai-sdk-bench
2
2
3
-
AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration. Automatically discovers and runs all tests in the `tests/` directory, verifying LLM-generated Svelte components against test suites.
3
+
AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration using the Vercel AI Gateway. Automatically discovers and runs all tests in the `tests/` directory, verifying LLM-generated Svelte components against test suites.
4
4
5
5
## Installation
6
6
@@ -12,84 +12,58 @@ bun install
12
12
13
13
## Setup
14
14
15
-
To set up`.env`:
15
+
Configure your API keys in`.env`:
16
16
17
-
```bash
18
-
cp .env.example .env
19
-
```
20
-
21
-
Then configure your API keys and model in `.env`:
22
-
23
-
```bash
24
-
# Required: Choose your model
25
-
MODEL=anthropic/claude-sonnet-4
26
-
ANTHROPIC_API_KEY=your_key_here
27
-
28
-
# Optional: Enable MCP integration (leave empty to disable)
29
-
MCP_SERVER_URL=https://mcp.svelte.dev/mcp
30
-
```
31
-
32
-
### Environment Variables
33
-
34
-
**Required:**
35
-
36
-
-`MODEL`: The AI model to use (e.g., `anthropic/claude-sonnet-4`, `openai/gpt-5`, `openrouter/anthropic/claude-sonnet-4`, `lmstudio/model-name`)
37
-
- Corresponding API key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `OPENROUTER_API_KEY`)
38
-
- Note: No API key required for `lmstudio/*` models (runs locally)
39
-
40
-
**Optional:**
17
+
1. Install Vercel CLI if you haven't already
18
+
2. Run `bun run vercel:link` and link the benchmark to a project that has AI Gateway enabled
19
+
3. Run the benchmark with "bun run dev"
41
20
42
-
-`MCP_SERVER_URL`: MCP server URL (leave empty to disable MCP integration)
21
+
### Required API Keys
43
22
44
-
### Supported Providers
23
+
You'll need at least one API key for the providers you want to test:
45
24
46
-
**Cloud Providers:**
25
+
-`VERCEL_OIDC_TOKEN`: The OIDC token for vercel AI gateway
47
26
48
-
-`anthropic/*` - Direct Anthropic API (requires `ANTHROPIC_API_KEY`)
49
-
-`openai/*` - Direct OpenAI API (requires `OPENAI_API_KEY`)
50
-
-`openrouter/*` - OpenRouter unified API (requires `OPENROUTER_API_KEY`)
51
-
52
-
**Local Providers:**
53
-
54
-
-`lmstudio/*` - LM Studio local server (requires LM Studio running on `http://localhost:1234`)
27
+
## Usage
55
28
56
-
Example configurations:
29
+
To run the benchmark:
57
30
58
31
```bash
59
-
# Anthropic
60
-
MODEL=anthropic/claude-sonnet-4
61
-
ANTHROPIC_API_KEY=sk-ant-...
32
+
bun run index.ts
33
+
```
62
34
63
-
# OpenAI
64
-
MODEL=openai/gpt-5
65
-
OPENAI_API_KEY=sk-...
35
+
### Interactive CLI
66
36
67
-
# OpenRouter
68
-
MODEL=openrouter/anthropic/claude-sonnet-4
69
-
OPENROUTER_API_KEY=sk-or-...
37
+
The benchmark features an interactive CLI that will prompt you for configuration:
70
38
71
-
# LM Studio (local)
72
-
MODEL=lmstudio/llama-3-8b
73
-
# No API key needed - make sure LM Studio is running!
74
-
```
39
+
1.**Model Selection**: Choose one or more models from the Vercel AI Gateway
40
+
- Select from available models in your configured providers
41
+
- Optionally add custom model IDs
42
+
- Can test multiple models in a single run
75
43
76
-
## Usage
44
+
2.**MCP Integration**: Choose your MCP configuration
45
+
-**No MCP Integration**: Run without external tools
46
+
-**MCP over HTTP**: Use HTTP-based MCP server (default: `https://mcp.svelte.dev/mcp`)
47
+
-**MCP over StdIO**: Use local MCP server via command (default: `npx -y @sveltejs/mcp`)
48
+
- Option to provide custom MCP server URL or command
77
49
78
-
To run the benchmark (automatically discovers and runs all tests):
50
+
3.**TestComponent Tool**: Enable/disable the testing tool for models
51
+
- Allows models to run tests during component development
52
+
- Enabled by default
79
53
80
-
```bash
81
-
bun run index.ts
82
-
```
54
+
### Benchmark Workflow
83
55
84
-
The benchmark will:
56
+
After configuration, the benchmark will:
85
57
86
58
1. Discover all tests in `tests/` directory
87
-
2. For each test:
59
+
2. For each selected model and test:
88
60
- Run the AI agent with the test's prompt
89
61
- Extract the generated Svelte component
90
62
- Verify the component against the test suite
91
63
3. Generate a combined report with all results
92
64
65
+
### Results and Reports
66
+
93
67
Results are saved to the `results/` directory with timestamped filenames:
94
68
95
69
-`results/result-2024-12-07-14-30-45.json` - Full execution trace with all test results
@@ -148,12 +122,17 @@ This copies each `Reference.svelte` to `Component.svelte` temporarily and runs t
148
122
149
123
## MCP Integration
150
124
151
-
The tool supports optional integration with MCP (Model Context Protocol) servers:
125
+
The tool supports optional integration with MCP (Model Context Protocol) servers through the interactive CLI. When running the benchmark, you'll be prompted to choose:
152
126
153
-
-**Enabled**: Set `MCP_SERVER_URL` to a valid MCP server URL
154
-
-**Disabled**: Leave `MCP_SERVER_URL` empty or unset
127
+
-**No MCP Integration**: Run without external tools
128
+
-**MCP over HTTP**: Connect to an HTTP-based MCP server
129
+
- Default: `https://mcp.svelte.dev/mcp`
130
+
- Option to provide a custom URL
131
+
-**MCP over StdIO**: Connect to a local MCP server via command
132
+
- Default: `npx -y @sveltejs/mcp`
133
+
- Option to provide a custom command
155
134
156
-
MCP status is documented in both the JSON metadata and displayed as a badge in the HTML report.
135
+
MCP status, transport type, and server configuration are documented in both the JSON metadata and displayed as a badge in the HTML report.
0 commit comments