Skip to main content

API

  • Case[Input, Output, Metadata]
  • Dataset[Input, Output, Metadata]
Source: datasets.py

Programmatic

from mcp_eval import Case, Dataset, ToolWasCalled, ResponseContains  cases = [  Case(  name="fetch_example",  inputs="Fetch https://example.com",  evaluators=[ToolWasCalled("fetch"), ResponseContains("Example Domain")],  ) ]  dataset = Dataset(name="Fetch Suite", cases=cases) report = await dataset.evaluate(lambda inputs, agent, session: agent.generate_str(inputs)) report.print(include_input=True, include_output=True) 
Parallel evaluation:
report = await dataset.evaluate(  lambda inputs, agent, session: agent.generate_str(inputs),  max_concurrency=4, ) 

YAML/JSON

Save/load via Dataset.to_file and Dataset.from_file. Schema: mcpeval.config.schema.json. YAML example (from basic_fetch_dataset.yaml):
name: "Basic Fetch Dataset" server_name: "fetch" cases:  - name: "simple_fetch"  inputs: "Fetch https://example.com"  expected_output: "Example Domain"  evaluators:  - ToolWasCalled:  tool_name: "fetch"  - ResponseContains:  text: "Example Domain" 

Concurrency

Dataset.evaluate(..., max_concurrency=N) runs cases in parallel.

Examples