Dataset Serialization
Learn how to save and load datasets in different formats, with support for custom evaluators and IDE integration.
Overview
Pydantic Evals supports serializing datasets to files in two formats:
- YAML (
.yaml,.yml) - Human-readable, great for version control - JSON (
.json) - Structured, machine-readable
Both formats support: - Automatic JSON schema generation for IDE autocomplete and validation - Custom evaluator serialization/deserialization - Type-safe loading with generic parameters
YAML Format
YAML is the recommended format for most use cases due to its readability and compact syntax.
Basic Example
from typing import Any from pydantic_evals import Case, Dataset from pydantic_evals.evaluators import EqualsExpected, IsInstance # Create a dataset with typed parameters dataset = Dataset[str, str, Any]( name='my_tests', cases=[ Case( name='test_1', inputs='hello', expected_output='HELLO', ), ], evaluators=[ IsInstance(type_name='str'), EqualsExpected(), ], ) # Save to YAML dataset.to_file('my_tests.yaml') This creates two files:
my_tests.yaml- The datasetmy_tests_schema.json- JSON schema for IDE support
YAML Output
# yaml-language-server: $schema=my_tests_schema.json name: my_tests cases: - name: test_1 inputs: hello expected_output: HELLO evaluators: - IsInstance: str - EqualsExpected JSON Schema for IDEs
The first line references the schema file:
# yaml-language-server: $schema=my_tests_schema.json This enables: - ✅ Autocomplete in VS Code, PyCharm, and other editors - ✅ Inline validation while editing - ✅ Documentation tooltips for fields - ✅ Error highlighting for invalid data
Editor Support
The yaml-language-server comment is supported by:
- VS Code (with YAML extension)
- JetBrains IDEs (PyCharm, IntelliJ, etc.)
- Most editors with YAML language server support
See the YAML Language Server docs for more details.
Loading from YAML
from pathlib import Path from typing import Any from pydantic_evals import Case, Dataset from pydantic_evals.evaluators import EqualsExpected, IsInstance # First create and save the dataset Path('my_tests.yaml').parent.mkdir(exist_ok=True) dataset = Dataset[str, str, Any]( name='my_tests', cases=[Case(name='test_1', inputs='hello', expected_output='HELLO')], evaluators=[IsInstance(type_name='str'), EqualsExpected()], ) dataset.to_file('my_tests.yaml') # Load the dataset with type parameters dataset = Dataset[str, str, Any].from_file('my_tests.yaml') def my_task(text: str) -> str: return text.upper() # Run evaluation report = dataset.evaluate_sync(my_task) JSON Format
JSON format is useful for programmatic generation or when strict structure is required.
Basic Example
from typing import Any from pydantic_evals import Case, Dataset from pydantic_evals.evaluators import EqualsExpected dataset = Dataset[str, str, Any]( name='my_tests', cases=[ Case(name='test_1', inputs='hello', expected_output='HELLO'), ], evaluators=[EqualsExpected()], ) # Save to JSON dataset.to_file('my_tests.json') JSON Output
{ "$schema": "my_tests_schema.json", "name": "my_tests", "cases": [ { "name": "test_1", "inputs": "hello", "expected_output": "HELLO" } ], "evaluators": [ "EqualsExpected" ] } The $schema key at the top enables IDE support similar to YAML.
Loading from JSON
from typing import Any from pydantic_evals import Case, Dataset from pydantic_evals.evaluators import EqualsExpected # First create and save the dataset dataset = Dataset[str, str, Any]( name='my_tests', cases=[Case(name='test_1', inputs='hello', expected_output='HELLO')], evaluators=[EqualsExpected()], ) dataset.to_file('my_tests.json') # Load from JSON dataset = Dataset[str, str, Any].from_file('my_tests.json') Schema Generation
Automatic Schema Creation
By default, to_file() creates a JSON schema file alongside your dataset:
from typing import Any from pydantic_evals import Case, Dataset dataset = Dataset[str, str, Any](cases=[Case(inputs='test')]) # Creates both my_tests.yaml AND my_tests_schema.json dataset.to_file('my_tests.yaml') Custom Schema Location
from pathlib import Path from typing import Any from pydantic_evals import Case, Dataset dataset = Dataset[str, str, Any](cases=[Case(inputs='test')]) # Create directories Path('data').mkdir(exist_ok=True) # Custom schema filename (relative to dataset file location) dataset.to_file( 'data/my_tests.yaml', schema_path='my_schema.json', ) # No schema file dataset.to_file('my_tests.yaml', schema_path=None) Schema Path Templates
Use {stem} to reference the dataset filename:
from typing import Any from pydantic_evals import Case, Dataset dataset = Dataset[str, str, Any](cases=[Case(inputs='test')]) # Creates: my_tests.yaml and my_tests.schema.json dataset.to_file( 'my_tests.yaml', schema_path='{stem}.schema.json', ) Manual Schema Generation
Generate a schema without saving the dataset:
import json from typing import Any from pydantic_evals import Dataset # Get schema as dictionary for a specific dataset type schema = Dataset[str, str, Any].model_json_schema_with_evaluators() # Save manually with open('custom_schema.json', 'w', encoding='utf-8') as f: json.dump(schema, f, indent=2) Custom Evaluators
Custom evaluators require special handling during serialization and deserialization.
Requirements
Custom evaluators must:
- Be decorated with
@dataclass - Inherit from
Evaluator - Be passed to both
to_file()andfrom_file()
Complete Example
from dataclasses import dataclass from typing import Any from pydantic_evals import Case, Dataset from pydantic_evals.evaluators import Evaluator, EvaluatorContext @dataclass class CustomThreshold(Evaluator): """Check if output length exceeds a threshold.""" min_length: int max_length: int = 100 def evaluate(self, ctx: EvaluatorContext) -> bool: length = len(str(ctx.output)) return self.min_length <= length <= self.max_length # Create dataset with custom evaluator dataset = Dataset[str, str, Any]( cases=[ Case( name='test_length', inputs='example', expected_output='long result', evaluators=[ CustomThreshold(min_length=5, max_length=20), ], ), ], ) # Save with custom evaluator types dataset.to_file( 'dataset.yaml', custom_evaluator_types=[CustomThreshold], ) Saved YAML
# yaml-language-server: $schema=dataset_schema.json cases: - name: test_length inputs: example expected_output: long result evaluators: - CustomThreshold: min_length: 5 max_length: 20 Loading with Custom Evaluators
from dataclasses import dataclass from typing import Any from pydantic_evals import Case, Dataset from pydantic_evals.evaluators import Evaluator, EvaluatorContext @dataclass class CustomThreshold(Evaluator): """Check if output length exceeds a threshold.""" min_length: int max_length: int = 100 def evaluate(self, ctx: EvaluatorContext) -> bool: length = len(str(ctx.output)) return self.min_length <= length <= self.max_length # First create and save the dataset dataset = Dataset[str, str, Any]( cases=[ Case( name='test_length', inputs='example', expected_output='long result', evaluators=[CustomThreshold(min_length=5, max_length=20)], ), ], ) dataset.to_file('dataset.yaml', custom_evaluator_types=[CustomThreshold]) # Load with custom evaluator registry dataset = Dataset[str, str, Any].from_file( 'dataset.yaml', custom_evaluator_types=[CustomThreshold], ) Important
You must pass custom_evaluator_types to both to_file() and from_file().
to_file(): Includes the evaluator in the JSON schemafrom_file(): Registers the evaluator for deserialization
Evaluator Serialization Formats
Evaluators can be serialized in three forms:
1. Name Only (No Parameters)
evaluators: - EqualsExpected - IsInstance: str # Using default parameter 2. Single Parameter (Short Form)
evaluators: - IsInstance: str - Contains: "required text" - MaxDuration: 2.0 3. Multiple Parameters (Dict Form)
evaluators: - CustomThreshold: min_length: 5 max_length: 20 - LLMJudge: rubric: "Response is accurate" model: "openai:gpt-5" include_input: true Format Comparison
| Feature | YAML | JSON |
|---|---|---|
| Human readable | ✅ Excellent | ⚠️ Good |
| Comments | ✅ Yes | ❌ No |
| Compact | ✅ Yes | ⚠️ Verbose |
| Machine parsing | ✅ Good | ✅ Excellent |
| IDE support | ✅ Yes | ✅ Yes |
| Version control | ✅ Clean diffs | ⚠️ Noisy diffs |
Recommendation: Use YAML for most cases, JSON for programmatic generation.
Advanced: Evaluator Serialization Name
Customize how your evaluator appears in serialized files:
from dataclasses import dataclass from pydantic_evals.evaluators import Evaluator, EvaluatorContext @dataclass class VeryLongDescriptiveEvaluatorName(Evaluator): @classmethod def get_serialization_name(cls) -> str: return 'ShortName' def evaluate(self, ctx: EvaluatorContext) -> bool: return True In YAML:
evaluators: - ShortName # Instead of VeryLongDescriptiveEvaluatorName Troubleshooting
Schema Not Found in IDE
Problem: YAML file doesn't show autocomplete
Solutions:
-
Check the schema path in the first line of YAML:
# yaml-language-server: $schema=correct_schema_name.json -
Verify schema file exists in the same directory
-
Restart the language server in your IDE
-
Install YAML extension (VS Code: "YAML" by Red Hat)
Custom Evaluator Not Found
Problem: ValueError: Unknown evaluator name: 'CustomEvaluator'
Solution: Pass custom_evaluator_types when loading:
from dataclasses import dataclass from typing import Any from pydantic_evals import Case, Dataset from pydantic_evals.evaluators import Evaluator, EvaluatorContext @dataclass class CustomEvaluator(Evaluator): def evaluate(self, ctx: EvaluatorContext) -> bool: return True # First create and save with custom evaluator dataset = Dataset[str, str, Any]( cases=[Case(inputs='test', evaluators=[CustomEvaluator()])], ) dataset.to_file('tests.yaml', custom_evaluator_types=[CustomEvaluator]) # Load with custom evaluator types dataset = Dataset[str, str, Any].from_file( 'tests.yaml', custom_evaluator_types=[CustomEvaluator], # Required! ) Format Inference Failed
Problem: ValueError: Cannot infer format from extension
Solution: Specify format explicitly:
from typing import Any from pydantic_evals import Case, Dataset dataset = Dataset[str, str, Any](cases=[Case(inputs='test')]) # Explicit format for unusual extensions dataset.to_file('data.txt', fmt='yaml') dataset_loaded = Dataset[str, str, Any].from_file('data.txt', fmt='yaml') Schema Generation Error
Problem: Custom evaluator causes schema generation to fail
Solution: Ensure evaluator is a proper dataclass:
from dataclasses import dataclass from pydantic_evals.evaluators import Evaluator, EvaluatorContext # ✅ Correct @dataclass class MyEvaluator(Evaluator): value: int def evaluate(self, ctx: EvaluatorContext) -> bool: return True # ❌ Wrong: Missing @dataclass class BadEvaluator(Evaluator): def __init__(self, value: int): self.value = value def evaluate(self, ctx: EvaluatorContext) -> bool: return True Next Steps
- Dataset Management - Creating and organizing datasets
- Custom Evaluators - Write custom evaluation logic
- Core Concepts - Understand the data model