Skip to main content
🔧 Having trouble? Don’t worry! This comprehensive guide will help you diagnose and fix common issues quickly. We’ve got your back!

Quick diagnostics

Before diving into specific issues, let’s run a quick health check:

System Check

mcp-eval doctor 
Comprehensive system diagnosis

Validate Config

mcp-eval validate 
Verify configuration and API keys

Test Connection

mcp-eval validate --servers 
Check server connectivity

Common error messages and solutions

🔑 Authentication errors

Symptoms:
anthropic.AuthenticationError: Invalid API Key openai.error.AuthenticationError: Incorrect API key provided 
Solutions:
  1. Check environment variables:
    # Verify keys are set echo $ANTHROPIC_API_KEY echo $OPENAI_API_KEY  # Set if missing export ANTHROPIC_API_KEY="sk-ant-..." export OPENAI_API_KEY="sk-..." 
  2. Use secrets file:
    # mcpeval.secrets.yaml anthropic:  api_key: "sk-ant-..." openai:  api_key: "sk-..." 
  3. Validate configuration:
    mcp-eval validate 
Pro tip: Never commit API keys to version control! Use .gitignore for secrets files.
Symptoms:
Rate limit reached for requests Too many requests, please retry after X seconds 
Solutions:
  1. Reduce concurrency:
    # mcpeval.yaml execution:  max_concurrency: 2 # Lower from default 5 
  2. Add retry logic:
    execution:  retry_failed: true  retry_delay: 5 # seconds between retries 
  3. Use different models for testing vs judging:
    # Use cheaper model for generation provider: "anthropic" model: "claude-3-haiku-20240307"  # But keep good model for judging judge:  model: "claude-3-5-sonnet-20241022" 

🔌 Server connection issues

Symptoms:
Server 'my_server' not found Failed to start MCP server: Command not found subprocess.CalledProcessError: returned non-zero exit status 
Solutions:
  1. Verify server configuration:
    # mcpeval.yaml or mcp-agent.config.yaml mcp:  servers:  my_server:  command: "python" # Ensure command exists  args: ["path/to/server.py"] # Check path is correct  env:  PYTHONPATH: "." # Add if needed 
  2. Test server manually:
    # Run the server command directly python path/to/server.py  # Check for errors or missing dependencies 
  3. Debug with verbose output:
    mcp-eval run tests/ -vv 
  4. Common fixes:
    • Install server dependencies: pip install -r requirements.txt
    • Use absolute paths: /full/path/to/server.py
    • Check file permissions: chmod +x server.py
    • Verify Python version compatibility
Symptoms:
No tools found for server 'my_server' Tool 'my_tool' was not called (expected at least 1 call) 
Solutions:
  1. Check server is listed in agent:
    from mcp_agent.agents.agent import Agent  # Ensure server_names includes your server agent = Agent(  name="test_agent",  server_names=["my_server"] # Must match config ) 
  2. Verify tool discovery:
    # List available tools mcp-eval server list --verbose 
  3. Check MCP protocol implementation:
    • Server must implement tools/list method
    • Tools must have proper schemas
    • Server must be running when agent connects
  4. Enable debug logging:
    # mcpeval.yaml logging:  level: DEBUG  show_mcp_messages: true 

⏱️ Timeout and performance issues

Symptoms:
TimeoutError: Test exceeded 300 seconds asyncio.TimeoutError Test killed due to timeout 
Solutions:
  1. Increase timeout globally:
    # mcpeval.yaml execution:  timeout_seconds: 600 # 10 minutes 
  2. Set per-test timeout:
    @task("Long running test", timeout=600) async def test_complex_operation(agent, session):  # Your test code 
  3. Optimize test prompts:
    # Instead of vague prompts: # "Do something with the data"  # Use specific prompts: "Fetch https://api.example.com/data and return the count" 
  4. Add performance assertions:
    await session.assert_that(  Expect.performance.response_time_under(5000), # 5 seconds  name="response_time_check" ) 
  5. Profile slow tests:
    # Increase verbosity and export HTML for manual review mcp-eval run tests/ -v --html reports/perf.html 
Symptoms:
Warning: Test consumed 10,000+ tokens Estimated cost: $X.XX exceeds budget 
Solutions:
  1. Use cheaper models for testing:
    # For basic tests provider: "anthropic" model: "claude-3-haiku-20240307" 
  2. Limit response length:
    response = await agent.generate_str(  "Summarize this in 50 words or less",  max_tokens=200 ) 
  3. Cache responses during development:
    development:  cache_responses: true  cache_ttl: 3600 # 1 hour 
  4. Monitor token usage:
    metrics = session.get_metrics() print(f"Tokens used: {metrics.total_tokens}") print(f"Estimated cost: ${metrics.estimated_cost}") 

🧪 Test execution problems

Symptoms:
AssertionError: Expected content to contain "example" Content was: "This is an Example page" # Note the capital E 
Solutions:
  1. Check case sensitivity:
    # Case-insensitive matching await session.assert_that(  Expect.content.contains("example", case_sensitive=False),  response=response ) 
  2. Use regex for flexible matching:
    await session.assert_that(  Expect.content.regex(r"exam\w+", case_sensitive=False),  response=response ) 
  3. Debug actual output:
    # Temporarily add debug output print(f"Actual response: {response!r}")  # Or save a JSON report mcp-eval run tests/ --json debug.json 
  4. Use partial matching for tools:
    await session.assert_that(  Expect.tools.output_matches(  tool_name="fetch",  expected_output="example",  match_type="contains" # Instead of "exact"  ) ) 
Symptoms:
Test passes sometimes, fails others Different results on each run Works locally but fails in CI 
Solutions:
  1. Set deterministic model parameters:
    response = await agent.generate_str(  prompt,  temperature=0, # Deterministic  seed=42 # Fixed seed if supported ) 
  2. Use objective assertions:
    # Instead of LLM judge for deterministic checks await session.assert_that(  Expect.tools.was_called("fetch"),  Expect.tools.count("fetch", 1),  Expect.content.contains("specific_string") ) 
  3. Add retry logic for network calls:
    @task("Network test", retry=3) async def test_external_api(agent, session):  # Will retry up to 3 times on failure 
  4. Isolate test environment:
    # CI-specific configuration execution:  parallel: false # Run tests sequentially  reset_between_tests: true # Clean state 

Debug mode walkthrough

When tests fail mysteriously, enable debug mode for detailed insights:

Step 1: Enable debug output

# Maximum verbosity mcp-eval run tests/ -vvv  # Or set in config 
# mcpeval.yaml debug:  enabled: true  log_level: DEBUG  save_traces: true  save_llm_calls: true 

Step 2: Examine the debug output

Look for these key sections:
[DEBUG] Starting test: test_fetch_example [DEBUG] Agent configuration: {name: "test_agent", servers: ["fetch"]} [DEBUG] Sending prompt: "Fetch https://example.com" [DEBUG] LLM Response: "I'll fetch that URL for you..." [DEBUG] Tool call: fetch(url="https://example.com") [DEBUG] Tool response: {"content": "Example Domain..."} [DEBUG] Final response: "The page contains..." [DEBUG] Assertion 'content_check' passed 

Step 3: Inspect OTEL traces

# View trace for specific test cat test-reports/traces/test_fetch_example.jsonl | jq '.'  # Or use the trace viewer mcp-eval trace view test-reports/traces/test_fetch_example.jsonl 
Key things to look for in traces:
  • Tool call sequences
  • Error spans
  • Timing information
  • Token usage per call

Network and connectivity debugging

Testing behind a proxy

# mcpeval.yaml network:  proxy:  http: "http://proxy.company.com:8080"  https: "https://proxy.company.com:8080"  timeout: 30  retry_on_connection_error: true 

Debugging SSL/TLS issues

# Disable SSL verification (development only!) export CURL_CA_BUNDLE="" export REQUESTS_CA_BUNDLE=""  # Or configure trusted certificates export SSL_CERT_FILE="/path/to/cacert.pem" 

Testing with local servers

# For localhost servers mcp:  servers:  local_server:  command: "python"  args: ["server.py"]  env:  HOST: "127.0.0.1"  PORT: "8080"  startup_timeout: 10 # Wait for server to start 

Performance troubleshooting

Identifying bottlenecks

# Save a machine-readable report and analyze offline mcp-eval run tests/ --json profile.json  # Analyze the report (custom scripts) cat profile.json | jq '.' | less 
Key metrics to watch:
  • llm_time_ms: Time spent in LLM calls
  • tool_time_ms: Time in tool execution
  • idle_time_ms: Wasted time between operations
  • max_concurrent_operations: Parallelism level

Optimization strategies

Reduce LLM calls

# Batch multiple checks response = await agent.generate_str(  "Fetch A, analyze it, then fetch B" ) 

Parallel execution

# Run tests concurrently @pytest.mark.parametrize("url", urls) @pytest.mark.parallel 

Cache results

cache:  enabled: true  ttl: 3600 

Optimize prompts

# Be specific to reduce iterations "Get the title from example.com" # Not: "Tell me about example.com" 

Platform-specific issues

macOS

# Add Python to PATH export PATH="/usr/local/bin:$PATH"  # Or use full paths in config command: "/usr/local/bin/python3" 

Windows

# Use forward slashes or escaped backslashes command: "python" args: ["C:/path/to/server.py"] # Or args: ["C:\\path\\to\\server.py"]  # Set encoding env:  PYTHONIOENCODING: "utf-8" 

Linux/Docker

# Fix permissions chmod +x server.py  # For Docker docker run --network=host mcp-eval 

Getting help

Self-service debugging

  1. Run diagnostics:
    mcp-eval doctor --full > diagnosis.txt 
  2. Check logs:
    # View recent test logs tail -f test-reports/logs/mcp-eval.log 
  3. Validate everything:
    mcp-eval validate 

Prepare an issue report

If you’re still stuck, let’s gather information for a bug report:
# Automatically collect diagnostics mcp-eval issue  # This will: # 1. Run system diagnostics # 2. Collect configuration (sanitized) # 3. Get recent error logs # 4. Generate issue template # 5. Open GitHub issue page 

Community support

Quick reference: Error codes

CodeMeaningQuick Fix
AUTH001Invalid API keyCheck environment variables
SRV001Server not foundVerify server name in config
SRV002Server failed to startCheck command and dependencies
TOOL001Tool not foundVerify server implements tool
TIMEOUT001Test timeoutIncrease timeout_seconds
ASSERT001Assertion failedCheck expected vs actual values
NET001Network errorCheck connectivity and proxy
RATE001Rate limitedReduce concurrency or add delays

Still stuck? Don’t hesitate to reach out! We’re here to help you succeed with mcp-eval. Remember, every great developer has faced these issues - you’re in good company! 🚀