RAGDiff

A flexible framework for comparing Retrieval-Augmented Generation (RAG) systems side-by-side, with support for subjective quality evaluation using LLMs.

Features

Multi-tool Support: Compare multiple RAG tools in parallel
Flexible Adapters: Easy-to-extend adapter pattern for adding new tools
Multiple Output Formats: Display, JSON, Markdown, and summary formats
Performance Metrics: Automatic latency measurement and result statistics
LLM Evaluation: Support for subjective quality assessment using Claude 4.1 Opus
Rich CLI: Beautiful terminal output with tables and panels
Comprehensive Testing: 90+ tests ensuring reliability

Installation

Prerequisites

Python 3.9+
uv - Fast Python package installer and resolver

To install uv:

# On macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Or with Homebrew brew install uv # Or with pip pip install uv

Setup

# Clone the repository git clone https://github.com/ansari-project/ragdiff.git cd ragdiff # Install dependencies with uv uv sync --all-extras # Install all dependencies including dev tools # Or install only core dependencies uv sync # Or install with goodmem support uv sync --extra goodmem # Copy environment template cp .env.example .env # Edit .env and add your API keys

Configuration

Create a configs/tools.yaml file:

tools: mawsuah: api_key_env: VECTARA_API_KEY corpus_id: ${VECTARA_CORPUS_ID} base_url: https://api.vectara.io timeout: 30 goodmem: api_key_env: GOODMEM_API_KEY base_url: https://api.goodmem.ai timeout: 30 llm: model: claude-opus-4-1-20250805 api_key_env: ANTHROPIC_API_KEY

Usage

Basic Comparison

# Compare all configured tools uv run python -m src.cli compare "What is Islamic inheritance law?" # Compare specific tools uv run python -m src.cli compare "Your query" --tool mawsuah --tool goodmem # Adjust number of results uv run python -m src.cli compare "Your query" --top-k 10

Output Formats

# Default display format (side-by-side) uv run python -m src.cli compare "Your query" # JSON output uv run python -m src.cli compare "Your query" --format json # Markdown output uv run python -m src.cli compare "Your query" --format markdown # Summary output uv run python -m src.cli compare "Your query" --format summary # Save to file uv run python -m src.cli compare "Your query" --output results.json --format json

Batch Comparison with LLM Evaluation

Run multiple queries and get comprehensive analysis:

# Basic batch comparison uv run python -m src.cli batch inputs/tafsir-test-queries.txt \ --config configs/tafsir.yaml \ --top-k 10 \ --format json # With LLM evaluation (generates holistic summary) uv run python -m src.cli batch inputs/tafsir-test-queries.txt \ --config configs/tafsir.yaml \ --evaluate \ --top-k 10 \ --format json # Custom output directory uv run python -m src.cli batch inputs/tafsir-test-queries.txt \ --config configs/tafsir.yaml \ --evaluate \ --output-dir my-results \ --format jsonl

The batch command with --evaluate generates:

Individual query results in JSON/JSONL/CSV format
Latency statistics (P50, P95, P99)
LLM evaluation summary showing wins and quality scores
Holistic summary (markdown file) with:
- Query-by-query breakdown with winners and scores
- Common themes: win distribution, recurring issues
- Key differentiators: what makes winner better vs loser weaknesses
- Overall verdict with production recommendation

Convert holistic summary to PDF:

# Generate PDF from markdown summary python md2pdf.py outputs/holistic_summary_TIMESTAMP.md

Other Commands

# List available tools uv run python -m src.cli list-tools # Validate configuration uv run python -m src.cli validate-config # Run quick test uv run python -m src.cli quick-test # Get help uv run python -m src.cli --help uv run python -m src.cli compare --help uv run python -m src.cli batch --help

Project Structure

ragdiff/ ├── src/ │ ├── core/ # Core models and configuration │ │ ├── models.py # Data models (RagResult, ComparisonResult, etc.) │ │ └── config.py # Configuration management │ ├── adapters/ # Tool adapters │ │ ├── base.py # Base adapter implementing SearchVectara interface │ │ ├── mawsuah.py # Vectara/Mawsuah adapter │ │ ├── goodmem.py # Goodmem adapter with mock fallback │ │ └── factory.py # Adapter factory │ ├── comparison/ # Comparison engine │ │ └── engine.py # Parallel/sequential search execution │ ├── display/ # Display formatters │ │ └── formatter.py # Multiple output format support │ └── cli.py # Typer CLI implementation ├── tests/ # Comprehensive test suite ├── configs/ # Configuration files └── requirements.txt # Python dependencies

Architecture

The tool follows the SPIDER protocol for systematic development:

Specification: Clear goals for subjective RAG comparison
Planning: Phased implementation approach
Implementation: Clean architecture with separation of concerns
Defense: Comprehensive test coverage (90+ tests)
Evaluation: Expert review and validation
Commit: Version control with clear history

Key Components

BaseRagTool: Abstract base implementing SearchVectara interface
Adapters: Tool-specific implementations (Mawsuah, Goodmem)
ComparisonEngine: Orchestrates parallel/sequential searches
ComparisonFormatter: Handles multiple output formats
Config: Manages YAML configuration with environment variables

Adding New RAG Tools

Create a new adapter in src/adapters/:

from .base import BaseRagTool from ..core.models import RagResult class MyToolAdapter(BaseRagTool): def search(self, query: str, top_k: int = 5) -> List[RagResult]: # Implement tool-specific search results = self.client.search(query, limit=top_k) return [self._convert_to_rag_result(r) for r in results]

Register in src/adapters/factory.py:

ADAPTER_REGISTRY["mytool"] = MyToolAdapter

Add configuration in configs/tools.yaml:

tools: mytool: api_key_env: MYTOOL_API_KEY base_url: https://api.mytool.com

Development

Running Tests

# Run all tests uv run pytest tests/ # Run specific test file uv run pytest tests/test_cli.py # Run with coverage uv run pytest tests/ --cov=src

Code Style

The project uses:

Black for formatting
Ruff for linting
MyPy for type checking

# Format code with Black uv run black src/ tests/ # Check linting with Ruff uv run ruff check src/ tests/ # Type checking with MyPy uv run mypy src/

Environment Variables

Required environment variables:

VECTARA_API_KEY: For Mawsuah/Vectara access
VECTARA_CORPUS_ID: Vectara corpus ID
GOODMEM_API_KEY: For Goodmem access (optional, uses mock if not set)
ANTHROPIC_API_KEY: For LLM evaluation (optional)

License

[Your License]

Contributing

Contributions welcome! Please follow the existing code style and add tests for new features.

Acknowledgments

Built following the SPIDER protocol for systematic development.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
codev		codev
configs		configs
inputs		inputs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
md2pdf.py		md2pdf.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAGDiff

Features

Installation

Prerequisites

Setup

Configuration

Usage

Basic Comparison

Output Formats

Batch Comparison with LLM Evaluation

Other Commands

Project Structure

Architecture

Key Components

Adding New RAG Tools

Development

Running Tests

Code Style

Environment Variables

License

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

ansari-project/ragdiff

Folders and files

Latest commit

History

Repository files navigation

RAGDiff

Features

Installation

Prerequisites

Setup

Configuration

Usage

Basic Comparison

Output Formats

Batch Comparison with LLM Evaluation

Other Commands

Project Structure

Architecture

Key Components

Adding New RAG Tools

Development

Running Tests

Code Style

Environment Variables

License

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages