A clean, modular implementation using Claude Agent SDK for deep research, data science workflows, and benchmark evaluations.
- Deep Research Pipeline: Multi-phase research on any topic with web search
- Data Science Workflows: Exploratory analysis, statistical analysis, and ML modeling
- GAIA Benchmark Evaluation: Evaluate Claude agents on the GAIA dataset
- Hydra Configuration: Clean configuration management with YAML files
- Rich Console Output: Beautiful progress tracking and logging
- Async Execution: Efficient concurrent task processing
cd agent1 uv venv uv pip install -e .Research a topic:
python examples/dr.py research.topic="Impact of AI on healthcare"Different research depths:
python examples/dr.py research.topic="Climate change" research=quick python examples/dr.py research.topic="Quantum computing" research=exhaustiveSave output to file:
python examples/dr.py research.topic="AI Ethics" research.output_file=report.mdAnalyze data:
python examples/ds.py data_science.task="Analyze sales trends" data_science.data_path=sales.csvBuild a model:
python examples/ds.py data_science.modeling.task="Predict customer churn" data_science.data_path=customers.csvRun GAIA evaluation on validation set:
python examples/run_gaia.py gaia.split=validation gaia.max_tasks=5Full test set evaluation:
python examples/run_gaia.py gaia.split=testagent1/ ├── examples/ │ ├── dr.py # Deep research CLI │ ├── ds.py # Data science CLI │ └── run_gaia.py # GAIA evaluation script ├── src/ │ ├── configs/ │ │ ├── deep_research.yaml # Research configuration │ │ ├── data_scientist.yaml # Data science configuration │ │ └── gaia.yaml # GAIA benchmark configuration │ ├── claude.py # Claude agent executor │ ├── pipelines.py # Pipeline implementations │ ├── logger.py # Rich console logger │ └── gaia_utils.py # GAIA dataset utilities └── data/ └── GAIA/ # GAIA dataset (add manually) All configurations use Hydra and are stored in src/configs/. Key options:
model.name: Claude model to use (default: claude-sonnet-4-5-20250929)model.temperature: Sampling temperaturemodel.max_tokens: Maximum tokens
research.topic: Research topic (required)research.depth: quick, standard, comprehensive, exhaustiveresearch.output_file: Optional output file path
gaia.split: validation or testgaia.max_tasks: Maximum tasks to evaluategaia.batch_size: Concurrent batch sizegaia.results_path: Output JSONL path
-
Download GAIA dataset to
data/GAIA/:2023_validation.json- Validation set with ground truth2023_test.json- Test set without ground truth
-
Run evaluation:
python examples/run_gaia.py gaia.split=validation- Smart Resume: Automatically skips completed tasks
- Batch Processing: Concurrent execution with configurable batch size
- Comprehensive Metrics: Accuracy calculation and detailed reports
- Error Recovery: Graceful error handling with detailed logging
- Result Persistence: JSONL format with metadata and costs
Results saved in JSONL:
{ "task_id": "test_001", "question": "What is 2 + 2?", "prediction": "4", "true_answer": "4", "tools_used": ["WebSearch"], "num_turns": 3, "cost_usd": 0.002, "duration_ms": 5432 }Test GAIA setup:
python test_gaia_setup.pyExecutes single agents with specified tools and configurations:
from src.claude import create_agent_executor executor = create_agent_executor() result = await executor.execute_agent( prompt="Research quantum computing", agent_type="research", allowed_tools=["WebSearch", "WebFetch"] )Orchestrates multi-phase pipelines:
from src.claude import create_pipeline_executor pipeline = create_pipeline_executor() result = await pipeline.execute_pipeline( phases=[...], initial_context="Topic: AI" )High-level interfaces for specific workflows:
from src.pipelines import DeepResearchPipeline, DataSciencePipeline # Research research = DeepResearchPipeline() result = await research.research("AI ethics", depth="comprehensive") # Data science ds = DataSciencePipeline() result = await ds.analyze_data(data_path="data.csv", analysis_type="exploratory")- Create configuration in
src/configs/ - Extend
BasePipelineinsrc/pipelines.py - Add CLI script in
examples/
Modify allowed_tools in agent configurations:
- Research: WebSearch, WebFetch, Read, Write
- Analysis: Read, Write, Bash, Grep, Glob
- Coding: Read, Write, Edit, Bash
- Import Errors: Ensure dependencies installed with
uv pip install -e . - API Errors: Check Claude API key is set
- Dataset Not Found: Download GAIA dataset to
data/GAIA/ - Out of Memory: Reduce
batch_sizein configuration
MIT