Skip to content

ansari-project/ragdiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAGDiff

A flexible framework for comparing Retrieval-Augmented Generation (RAG) systems side-by-side, with support for subjective quality evaluation using LLMs.

Features

  • Multi-tool Support: Compare multiple RAG tools in parallel
  • Flexible Adapters: Easy-to-extend adapter pattern for adding new tools
  • Multiple Output Formats: Display, JSON, Markdown, and summary formats
  • Performance Metrics: Automatic latency measurement and result statistics
  • LLM Evaluation: Support for subjective quality assessment using Claude 4.1 Opus
  • Rich CLI: Beautiful terminal output with tables and panels
  • Comprehensive Testing: 90+ tests ensuring reliability

Installation

Prerequisites

  • Python 3.9+
  • uv - Fast Python package installer and resolver

To install uv:

# On macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Or with Homebrew brew install uv # Or with pip pip install uv

Setup

# Clone the repository git clone https://github.com/ansari-project/ragdiff.git cd ragdiff # Install dependencies with uv uv sync --all-extras # Install all dependencies including dev tools # Or install only core dependencies uv sync # Or install with goodmem support uv sync --extra goodmem # Copy environment template cp .env.example .env # Edit .env and add your API keys

Configuration

Create a configs/tools.yaml file:

tools: mawsuah: api_key_env: VECTARA_API_KEY corpus_id: ${VECTARA_CORPUS_ID} base_url: https://api.vectara.io timeout: 30 goodmem: api_key_env: GOODMEM_API_KEY base_url: https://api.goodmem.ai timeout: 30 llm: model: claude-opus-4-1-20250805 api_key_env: ANTHROPIC_API_KEY

Usage

Basic Comparison

# Compare all configured tools uv run python -m src.cli compare "What is Islamic inheritance law?" # Compare specific tools uv run python -m src.cli compare "Your query" --tool mawsuah --tool goodmem # Adjust number of results uv run python -m src.cli compare "Your query" --top-k 10

Output Formats

# Default display format (side-by-side) uv run python -m src.cli compare "Your query" # JSON output uv run python -m src.cli compare "Your query" --format json # Markdown output uv run python -m src.cli compare "Your query" --format markdown # Summary output uv run python -m src.cli compare "Your query" --format summary # Save to file uv run python -m src.cli compare "Your query" --output results.json --format json

Batch Comparison with LLM Evaluation

Run multiple queries and get comprehensive analysis:

# Basic batch comparison uv run python -m src.cli batch inputs/tafsir-test-queries.txt \ --config configs/tafsir.yaml \ --top-k 10 \ --format json # With LLM evaluation (generates holistic summary) uv run python -m src.cli batch inputs/tafsir-test-queries.txt \ --config configs/tafsir.yaml \ --evaluate \ --top-k 10 \ --format json # Custom output directory uv run python -m src.cli batch inputs/tafsir-test-queries.txt \ --config configs/tafsir.yaml \ --evaluate \ --output-dir my-results \ --format jsonl

The batch command with --evaluate generates:

  • Individual query results in JSON/JSONL/CSV format
  • Latency statistics (P50, P95, P99)
  • LLM evaluation summary showing wins and quality scores
  • Holistic summary (markdown file) with:
    • Query-by-query breakdown with winners and scores
    • Common themes: win distribution, recurring issues
    • Key differentiators: what makes winner better vs loser weaknesses
    • Overall verdict with production recommendation

Convert holistic summary to PDF:

# Generate PDF from markdown summary python md2pdf.py outputs/holistic_summary_TIMESTAMP.md

Other Commands

# List available tools uv run python -m src.cli list-tools # Validate configuration uv run python -m src.cli validate-config # Run quick test uv run python -m src.cli quick-test # Get help uv run python -m src.cli --help uv run python -m src.cli compare --help uv run python -m src.cli batch --help

Project Structure

ragdiff/ ├── src/ │ ├── core/ # Core models and configuration │ │ ├── models.py # Data models (RagResult, ComparisonResult, etc.) │ │ └── config.py # Configuration management │ ├── adapters/ # Tool adapters │ │ ├── base.py # Base adapter implementing SearchVectara interface │ │ ├── mawsuah.py # Vectara/Mawsuah adapter │ │ ├── goodmem.py # Goodmem adapter with mock fallback │ │ └── factory.py # Adapter factory │ ├── comparison/ # Comparison engine │ │ └── engine.py # Parallel/sequential search execution │ ├── display/ # Display formatters │ │ └── formatter.py # Multiple output format support │ └── cli.py # Typer CLI implementation ├── tests/ # Comprehensive test suite ├── configs/ # Configuration files └── requirements.txt # Python dependencies 

Architecture

The tool follows the SPIDER protocol for systematic development:

  1. Specification: Clear goals for subjective RAG comparison
  2. Planning: Phased implementation approach
  3. Implementation: Clean architecture with separation of concerns
  4. Defense: Comprehensive test coverage (90+ tests)
  5. Evaluation: Expert review and validation
  6. Commit: Version control with clear history

Key Components

  • BaseRagTool: Abstract base implementing SearchVectara interface
  • Adapters: Tool-specific implementations (Mawsuah, Goodmem)
  • ComparisonEngine: Orchestrates parallel/sequential searches
  • ComparisonFormatter: Handles multiple output formats
  • Config: Manages YAML configuration with environment variables

Adding New RAG Tools

  1. Create a new adapter in src/adapters/:
from .base import BaseRagTool from ..core.models import RagResult class MyToolAdapter(BaseRagTool): def search(self, query: str, top_k: int = 5) -> List[RagResult]: # Implement tool-specific search results = self.client.search(query, limit=top_k) return [self._convert_to_rag_result(r) for r in results]
  1. Register in src/adapters/factory.py:
ADAPTER_REGISTRY["mytool"] = MyToolAdapter
  1. Add configuration in configs/tools.yaml:
tools: mytool: api_key_env: MYTOOL_API_KEY base_url: https://api.mytool.com

Development

Running Tests

# Run all tests uv run pytest tests/ # Run specific test file uv run pytest tests/test_cli.py # Run with coverage uv run pytest tests/ --cov=src

Code Style

The project uses:

  • Black for formatting
  • Ruff for linting
  • MyPy for type checking
# Format code with Black uv run black src/ tests/ # Check linting with Ruff uv run ruff check src/ tests/ # Type checking with MyPy uv run mypy src/

Environment Variables

Required environment variables:

  • VECTARA_API_KEY: For Mawsuah/Vectara access
  • VECTARA_CORPUS_ID: Vectara corpus ID
  • GOODMEM_API_KEY: For Goodmem access (optional, uses mock if not set)
  • ANTHROPIC_API_KEY: For LLM evaluation (optional)

License

[Your License]

Contributing

Contributions welcome! Please follow the existing code style and add tests for new features.

Acknowledgments

Built following the SPIDER protocol for systematic development.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages