Troubleshooting Guide

🔧 Having trouble? Don’t worry! This comprehensive guide will help you diagnose and fix common issues quickly. We’ve got your back!

Quick diagnostics

Before diving into specific issues, let’s run a quick health check:

System Check

mcp-eval doctor

Comprehensive system diagnosis

Validate Config

mcp-eval validate

Verify configuration and API keys

Test Connection

mcp-eval validate --servers

Check server connectivity

Common error messages and solutions

🔑 Authentication errors

Error: Invalid API key or authentication failed

Symptoms:

anthropic.AuthenticationError: Invalid API Key openai.error.AuthenticationError: Incorrect API key provided 

Solutions:

Check environment variables:

# Verify keys are set echo $ANTHROPIC_API_KEY echo $OPENAI_API_KEY  # Set if missing export ANTHROPIC_API_KEY="sk-ant-..." export OPENAI_API_KEY="sk-..." 

Use secrets file:

# mcpeval.secrets.yaml anthropic:  api_key: "sk-ant-..." openai:  api_key: "sk-..." 

Validate configuration:
```
mcp-eval validate 
```

Pro tip: Never commit API keys to version control! Use .gitignore for secrets files.

Error: Rate limit exceeded

Symptoms:

Rate limit reached for requests Too many requests, please retry after X seconds

Solutions:

Reduce concurrency:

# mcpeval.yaml execution:  max_concurrency: 2 # Lower from default 5

Add retry logic:

execution:  retry_failed: true  retry_delay: 5 # seconds between retries

Use different models for testing vs judging:

# Use cheaper model for generation provider: "anthropic" model: "claude-3-haiku-20240307"  # But keep good model for judging judge:  model: "claude-3-5-sonnet-20241022" 

🔌 Server connection issues

Error: MCP server not found or failed to start

Symptoms:

Server 'my_server' not found Failed to start MCP server: Command not found subprocess.CalledProcessError: returned non-zero exit status 

Solutions:

Verify server configuration:

# mcpeval.yaml or mcp-agent.config.yaml mcp:  servers:  my_server:  command: "python" # Ensure command exists  args: ["path/to/server.py"] # Check path is correct  env:  PYTHONPATH: "." # Add if needed 

Test server manually:

# Run the server command directly python path/to/server.py  # Check for errors or missing dependencies 

Debug with verbose output:
```
mcp-eval run tests/ -vv 
```
Common fixes:
- Install server dependencies: pip install -r requirements.txt
- Use absolute paths: /full/path/to/server.py
- Check file permissions: chmod +x server.py
- Verify Python version compatibility

Error: No tools detected from server

Symptoms:

No tools found for server 'my_server' Tool 'my_tool' was not called (expected at least 1 call) 

Solutions:

Check server is listed in agent:

from mcp_agent.agents.agent import Agent  # Ensure server_names includes your server agent = Agent(  name="test_agent",  server_names=["my_server"] # Must match config ) 

Verify tool discovery:

# List available tools mcp-eval server list --verbose

Check MCP protocol implementation:
- Server must implement tools/list method
- Tools must have proper schemas
- Server must be running when agent connects

Enable debug logging:

# mcpeval.yaml logging:  level: DEBUG  show_mcp_messages: true

⏱️ Timeout and performance issues

Error: Test execution timed out

Symptoms:

TimeoutError: Test exceeded 300 seconds asyncio.TimeoutError Test killed due to timeout 

Solutions:

Increase timeout globally:

# mcpeval.yaml execution:  timeout_seconds: 600 # 10 minutes

Set per-test timeout:

@task("Long running test", timeout=600) async def test_complex_operation(agent, session):  # Your test code 

Optimize test prompts:

# Instead of vague prompts: # "Do something with the data"  # Use specific prompts: "Fetch https://api.example.com/data and return the count" 

Add performance assertions:

await session.assert_that(  Expect.performance.response_time_under(5000), # 5 seconds  name="response_time_check" ) 

Profile slow tests:

# Increase verbosity and export HTML for manual review mcp-eval run tests/ -v --html reports/perf.html 

Error: High token usage or costs

Symptoms:

Warning: Test consumed 10,000+ tokens Estimated cost: $X.XX exceeds budget

Solutions:

Use cheaper models for testing:

# For basic tests provider: "anthropic" model: "claude-3-haiku-20240307"

Limit response length:

response = await agent.generate_str(  "Summarize this in 50 words or less",  max_tokens=200 ) 

Cache responses during development:

development:  cache_responses: true  cache_ttl: 3600 # 1 hour

Monitor token usage:

metrics = session.get_metrics() print(f"Tokens used: {metrics.total_tokens}") print(f"Estimated cost: ${metrics.estimated_cost}") 

🧪 Test execution problems

Error: Assertion failed but seems correct

Symptoms:

AssertionError: Expected content to contain "example" Content was: "This is an Example page" # Note the capital E 

Solutions:

Check case sensitivity:

# Case-insensitive matching await session.assert_that(  Expect.content.contains("example", case_sensitive=False),  response=response ) 

Use regex for flexible matching:

await session.assert_that(  Expect.content.regex(r"exam\w+", case_sensitive=False),  response=response ) 

Debug actual output:

# Temporarily add debug output print(f"Actual response: {response!r}")  # Or save a JSON report mcp-eval run tests/ --json debug.json 

Use partial matching for tools:

await session.assert_that(  Expect.tools.output_matches(  tool_name="fetch",  expected_output="example",  match_type="contains" # Instead of "exact"  ) ) 

Error: Flaky or inconsistent test results

Symptoms:

Test passes sometimes, fails others Different results on each run Works locally but fails in CI 

Solutions:

Set deterministic model parameters:

response = await agent.generate_str(  prompt,  temperature=0, # Deterministic  seed=42 # Fixed seed if supported ) 

Use objective assertions:

# Instead of LLM judge for deterministic checks await session.assert_that(  Expect.tools.was_called("fetch"),  Expect.tools.count("fetch", 1),  Expect.content.contains("specific_string") ) 

Add retry logic for network calls:

@task("Network test", retry=3) async def test_external_api(agent, session):  # Will retry up to 3 times on failure 

Isolate test environment:

# CI-specific configuration execution:  parallel: false # Run tests sequentially  reset_between_tests: true # Clean state 

Debug mode walkthrough

When tests fail mysteriously, enable debug mode for detailed insights:

Step 1: Enable debug output

# Maximum verbosity mcp-eval run tests/ -vvv  # Or set in config

# mcpeval.yaml debug:  enabled: true  log_level: DEBUG  save_traces: true  save_llm_calls: true 

Step 2: Examine the debug output

Look for these key sections:

[DEBUG] Starting test: test_fetch_example [DEBUG] Agent configuration: {name: "test_agent", servers: ["fetch"]} [DEBUG] Sending prompt: "Fetch https://example.com" [DEBUG] LLM Response: "I'll fetch that URL for you..." [DEBUG] Tool call: fetch(url="https://example.com") [DEBUG] Tool response: {"content": "Example Domain..."} [DEBUG] Final response: "The page contains..." [DEBUG] Assertion 'content_check' passed 

Step 3: Inspect OTEL traces

# View trace for specific test cat test-reports/traces/test_fetch_example.jsonl | jq '.'  # Or use the trace viewer mcp-eval trace view test-reports/traces/test_fetch_example.jsonl 

Key things to look for in traces:

Tool call sequences
Error spans
Timing information
Token usage per call

Network and connectivity debugging

Testing behind a proxy

# mcpeval.yaml network:  proxy:  http: "http://proxy.company.com:8080"  https: "https://proxy.company.com:8080"  timeout: 30  retry_on_connection_error: true 

Debugging SSL/TLS issues

# Disable SSL verification (development only!) export CURL_CA_BUNDLE="" export REQUESTS_CA_BUNDLE=""  # Or configure trusted certificates export SSL_CERT_FILE="/path/to/cacert.pem" 

Testing with local servers

# For localhost servers mcp:  servers:  local_server:  command: "python"  args: ["server.py"]  env:  HOST: "127.0.0.1"  PORT: "8080"  startup_timeout: 10 # Wait for server to start 

Performance troubleshooting

Identifying bottlenecks

# Save a machine-readable report and analyze offline mcp-eval run tests/ --json profile.json  # Analyze the report (custom scripts) cat profile.json | jq '.' | less 

Key metrics to watch:

llm_time_ms: Time spent in LLM calls
tool_time_ms: Time in tool execution
idle_time_ms: Wasted time between operations
max_concurrent_operations: Parallelism level

Optimization strategies

Reduce LLM calls

# Batch multiple checks response = await agent.generate_str(  "Fetch A, analyze it, then fetch B" ) 

Parallel execution

# Run tests concurrently @pytest.mark.parametrize("url", urls) @pytest.mark.parallel 

Cache results

cache:  enabled: true  ttl: 3600

Optimize prompts

# Be specific to reduce iterations "Get the title from example.com" # Not: "Tell me about example.com" 

Platform-specific issues

macOS

Command not found errors

# Add Python to PATH export PATH="/usr/local/bin:$PATH"  # Or use full paths in config command: "/usr/local/bin/python3" 

Windows

Path and encoding issues

# Use forward slashes or escaped backslashes command: "python" args: ["C:/path/to/server.py"] # Or args: ["C:\\path\\to\\server.py"]  # Set encoding env:  PYTHONIOENCODING: "utf-8" 

Linux/Docker

Permission and container issues

# Fix permissions chmod +x server.py  # For Docker docker run --network=host mcp-eval 

Getting help

Self-service debugging

Run diagnostics:
```
mcp-eval doctor --full > diagnosis.txt 
```

Check logs:

# View recent test logs tail -f test-reports/logs/mcp-eval.log

Validate everything:
```
mcp-eval validate 
```

Prepare an issue report

If you’re still stuck, let’s gather information for a bug report:

# Automatically collect diagnostics mcp-eval issue  # This will: # 1. Run system diagnostics # 2. Collect configuration (sanitized) # 3. Get recent error logs # 4. Generate issue template # 5. Open GitHub issue page 

Community support

💬 Discord: Join our community
🐛 GitHub Issues: Report bugs
💡 Discussions: Ask questions
📚 FAQ: Check our frequently asked questions

Quick reference: Error codes

Code	Meaning	Quick Fix
`AUTH001`	Invalid API key	Check environment variables
`SRV001`	Server not found	Verify server name in config
`SRV002`	Server failed to start	Check command and dependencies
`TOOL001`	Tool not found	Verify server implements tool
`TIMEOUT001`	Test timeout	Increase timeout_seconds
`ASSERT001`	Assertion failed	Check expected vs actual values
`NET001`	Network error	Check connectivity and proxy
`RATE001`	Rate limited	Reduce concurrency or add delays

Still stuck? Don’t hesitate to reach out! We’re here to help you succeed with mcp-eval. Remember, every great developer has faced these issues - you’re in good company! 🚀

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

​Quick diagnostics

System Check

Validate Config

Test Connection

​Common error messages and solutions

​🔑 Authentication errors

​🔌 Server connection issues

​⏱️ Timeout and performance issues

​🧪 Test execution problems

​Debug mode walkthrough

​Step 1: Enable debug output

​Step 2: Examine the debug output

​Step 3: Inspect OTEL traces

​Network and connectivity debugging

​Testing behind a proxy

​Debugging SSL/TLS issues

​Testing with local servers

​Performance troubleshooting

​Identifying bottlenecks

​Optimization strategies

Reduce LLM calls

Parallel execution

Cache results

Optimize prompts

​Platform-specific issues

​macOS

​Windows

​Linux/Docker

​Getting help

​Self-service debugging

​Prepare an issue report

​Community support

​Quick reference: Error codes

Quick diagnostics

Common error messages and solutions

🔑 Authentication errors

🔌 Server connection issues

⏱️ Timeout and performance issues

🧪 Test execution problems

Debug mode walkthrough

Step 1: Enable debug output

Step 2: Examine the debug output

Step 3: Inspect OTEL traces

Network and connectivity debugging

Testing behind a proxy

Debugging SSL/TLS issues

Testing with local servers

Performance troubleshooting

Identifying bottlenecks

Optimization strategies

Platform-specific issues

macOS

Windows

Linux/Docker

Getting help

Self-service debugging

Prepare an issue report

Community support

Quick reference: Error codes