Skip to main content
🚀 Welcome to mcp-eval! You’re about to supercharge your MCP development with powerful testing capabilities. This guide will have you testing MCP servers and agents in just 5 minutes!

What you’ll learn

By the end of this quickstart, you’ll be able to:
  • ✅ Install and configure mcp-eval for your project
  • ✅ Connect your MCP servers for testing
  • ✅ Write and run your first test
  • ✅ Understand test reports and iterate on failures
  • ✅ Choose the right testing style for your needs
Time to complete: ~5 minutes

Before you begin

Let’s make sure you have everything ready:

System requirements

Python 3.10+

Required for running mcp-evalDownload Python →

MCP Server

Any MCP-compatible server to testBrowse MCP servers →

API Key

Claude or OpenAI key for LLM featuresGet Claude API →
New to MCP? No worries! Check out the MCP documentation to understand the basics of Model Context Protocol servers. You’ll be testing them like a pro in no time!

Your 5-minute journey to testing mastery

1

Install `mcp-eval` and configure API keys

First, let’s get mcp-eval installed for your project.We recommend using uv to install mcp-eval as a global tool:
uv tool install mcpevals 
This makes the mcp-eval CLI available globally on your system.
Language agnostic testing: mcp-eval can test MCP servers written in any language - Python, TypeScript, Go, Rust, Java, etc. As long as your server implements the MCP protocol, mcp-eval can test it!
Next, add mcp-eval as a dependency for your project:

Using uv in a project

uv add mcpevals 
Alternatively:
pip install mcpevals 
Now set up your API key for the best experience:
# We recommend Claude for superior test generation and judging export ANTHROPIC_API_KEY="sk-ant-..."  # Alternative: OpenAI export OPENAI_API_KEY="sk-..." 
Pro tip: Claude Sonnet or Opus models provide the best results for test generation and LLM judge evaluations!
2

Initialize your test project

Let’s set up your testing environment with our interactive wizard:
mcp-eval init 
This friendly wizard will:
  • 🎯 Ask for your preferred LLM provider and model
  • 📝 Create mcpeval.yaml with your configuration
  • 🔐 Set up mcpeval.secrets.yaml for secure API key storage
  • 🤖 Help you define your first test agent
  • 🔧 Import any existing MCP servers
What happens during init:
? Select your LLM provider: Anthropic ? Select model: claude-3-5-sonnet-20241022 ? Import servers from mcp.json? Yes ? Path to mcp.json: .cursor/mcp.json ✓ Found 2 servers: fetch, filesystem ? Create a default agent? Yes ? Agent name: TestBot ? Agent instruction: You test MCP servers thoroughly ✓ Configuration saved to mcpeval.yaml ✓ Secrets saved to mcpeval.secrets.yaml 
3

Configure MCP servers

Before we can test an MCP server, you need to tell mcp-eval how to connect to it.
Connection works over any supported transport (stdio, websocket, sse, streamable_http). You can import server configurations from mcp.json or dxt files, or specify them interactively using the mcp-eval server add command.

Adding your MCP server

You have several ways to add a server to your configuration:
The easiest way - let mcp-eval guide you:
mcp-eval server add 
This will prompt you for:
  • How to add (interactive, from-mcp-json, or from-dxt)
  • Server name (e.g., “fetch”)
  • Command to run (e.g., “uvx mcp-server-fetch”)
  • Any arguments or environment variables
Example interaction:
? How would you like to add the server? interactive ? Server name: fetch  ? Command: uvx mcp-server-fetch ? Add environment variables? No ✓ Added server 'fetch' 

Common server examples

Here are some popular MCP servers you might want to test:
# Fetch server (web content) uvx mcp-server-fetch 
Verify your server configuration: After adding a server, you can verify it’s working:
# List all configured servers mcp-eval server list  # Validate server connectivity mcp-eval validate 
4

Run your first test

Time for the exciting part - running your first test! We’ll use the included fetch server example to demonstrate.
Example structure: The examples assume you have the fetch server configured. If you’re testing a different server, you’ll need to adjust the test code accordingly.
First, let’s make sure we have an example test. If you used mcp-eval init, you might already have one. Otherwise, let’s run:
mcp-eval run examples/mcp_server_fetch/tests/test_decorator_style.py \  -v \  --markdown test-reports/results.md \  --html test-reports/index.html 
What’s happening:
  • 🏃 Running decorator-style tests from the example file
  • 📊 Verbose output (-v) shows test progress
  • 📝 Markdown report for documentation
  • 🌐 HTML report for interactive exploration
Expected output:
Running tests... ✓ test_basic_fetch_decorator - Test basic URL fetching [2.3s]  ✓ fetch_tool_called: Tool 'fetch' was called  ✓ contains_domain_text: Content contains "Example Domain"  ✓ fetch_success_rate: Tool success rate 100%  ✓ test_content_extraction_decorator - Test extraction quality [3.1s]  ✓ fetch_called_for_extraction: Tool 'fetch' was called  ✓ extraction_quality_assessment: LLM judge score 0.92  Results: 2 passed, 0 failed Reports saved to test-reports/ 
5

Explore your test results

Open your shiny new test report to see the details:
# Open the HTML report in your browser open test-reports/index.html  # Or view the markdown report cat test-reports/results.md 
Understanding the HTML report:The interactive report shows:
  • 📊 Overview dashboard - Pass/fail rates, performance metrics
  • 🔍 Test details - Each test with all assertions
  • 🛠️ Tool usage - What tools were called and when
  • 💭 LLM reasoning - The agent’s thought process
  • Performance - Response times and efficiency metrics
  • 🎯 Failed assertions - Detailed diffs and explanations
Common things to check:
  • Did the right tools get called?
  • Was the output accurate?
  • How efficient was the agent’s approach?
  • What was the LLM judge’s assessment?
Test failed? Don’t worry! Check the assertion details to understand why. Common issues:
  • Tool not found (check server configuration)
  • Content mismatch (adjust your assertions)
  • Timeout (increase timeout in config)

What’s next? Write your own test!

Now that you’ve run the example, let’s write your very first custom test:

Choose your testing style

Best for: Quick, readable tests
from mcp_eval import task, Expect  @task("My first test") async def test_my_server(agent, session):  response = await agent.generate_str(  "Use my tool to do something"  )    await session.assert_that(  Expect.tools.was_called("my_tool"),  response=response  ) 

Your test file structure

Create a new test file tests/test_my_server.py:
"""Tests for my awesome MCP server."""  from mcp_eval import task, setup, Expect  @setup def configure_tests():  """Any setup needed before tests run."""  print("🚀 Starting my server tests!")  @task("Test basic functionality") async def test_basic_operation(agent, session):  """Verify the server responds correctly to basic requests."""    # 1. Send a prompt to the agent  response = await agent.generate_str(  "Please use the calculator to add 2 + 2"  )    # 2. Check that the right tool was called  await session.assert_that(  Expect.tools.was_called("calculate"),  name="calculator_used"  )    # 3. Verify the response content  await session.assert_that(  Expect.content.contains("4"),  name="correct_answer",  response=response  )    # 4. Check efficiency (optional)  await session.assert_that(  Expect.performance.max_iterations(3),  name="completed_efficiently"  )  @task("Test error handling") async def test_error_recovery(agent, session):  """Verify graceful error handling."""    response = await agent.generate_str(  "Try to divide by zero, then recover"  )    # Use LLM judge for complex behavior  await session.assert_that(  Expect.judge.llm(  rubric="Agent should handle error gracefully and provide helpful response",  min_score=0.8  ),  name="error_handling_quality",  response=response  ) 
Run your new test:
mcp-eval run tests/test_my_server.py -v --html reports/my_test.html 

Troubleshooting common issues

Solution: Check your mcpeval.yaml to ensure the server is properly configured:
mcp:  servers:  my_server:  command: "python"  args: ["path/to/server.py"] 
Also verify the server name matches what you’re using in your agent’s server_names.
Solution: Increase the timeout in your configuration:
execution:  timeout_seconds: 600 # 10 minutes 
Solution: Ensure your API key is set correctly:
# Check if it's set echo $ANTHROPIC_API_KEY  # Or add to mcpeval.secrets.yaml anthropic:  api_key: "sk-ant-..." 

Resources to level up

Ready to become a mcp-eval expert? Here’s your learning path:

Get help


Congratulations! 🎉 You’ve successfully set up mcp-eval and run your first tests. You’re now ready to ensure your MCP servers and agents work flawlessly. Happy testing!