output.mp4
The ultimate development partners for Claude - a Model Context Protocol server that gives Claude access to multiple AI models for enhanced code analysis, problem-solving, and collaborative development.
Features true AI orchestration with conversations that continue across tasks - Give Claude a complex task and let it orchestrate between models automatically. Claude stays in control, performs the actual work, but gets perspectives from the best AI for each subtask. With tools like planner
for breaking down complex projects, analyze
for understanding codebases, codereview
for audits, refactor
for improving code structure, debug
for solving complex problems, and precommit
for validating changes, Claude can switch between different tools and models mid-conversation, with context carrying forward seamlessly.
Example Workflow - Claude Code:
- Performs its own reasoning
- Uses Gemini Pro to deeply
analyze
the code in question for a second opinion - Switches to O3 to continue
chatting
about its findings - Uses Flash to evaluate formatting suggestions from O3
- Performs the actual work after taking in feedback from all three
- Returns to Pro for a
precommit
review
All within a single conversation thread! Gemini Pro in step 6 knows what was recommended by O3 in step 3! Taking that context and review into consideration to aid with its pre-commit review.
Think of it as Claude Code for Claude Code. This MCP isn't magic. It's just super-glue.
Remember: Claude stays in full control β but YOU call the shots. Zen is designed to have Claude engage other models only when needed β and to follow through with meaningful back-and-forth. You're the one who crafts the powerful prompt that makes Claude bring in Gemini, Flash, O3 β or fly solo.
You're the guide. The prompter. The puppeteer.
Because these AI models clearly aren't when they get chatty β
-
Getting Started
- Quickstart - Get running in 5 minutes
- Available Tools - Overview of all tools
- AI-to-AI Conversations - Multi-turn conversations
-
Tools Reference
chat
- Collaborative thinkingthinkdeep
- Extended reasoningplanner
- Interactive step-by-step planningconsensus
- Multi-model consensus analysiscodereview
- Code reviewprecommit
- Pre-commit validationdebug
- Debugging helpanalyze
- File analysisrefactor
- Code refactoring with decomposition focustracer
- Call-flow mapping and dependency tracingtestgen
- Test generation with edge casesyour custom tool
- Create custom tools for specialized workflows
-
Advanced Usage
- Advanced Features - AI-to-AI conversations, large prompts, web search
- Complete Advanced Guide - Model configuration, thinking modes, workflows, tool parameters
-
Setup & Support
- Troubleshooting Guide - Common issues and debugging steps
- License - Apache 2.0
Claude is brilliant, but sometimes you need:
- Multiple AI perspectives - Let Claude orchestrate between different models to get the best analysis
- Automatic model selection - Claude picks the right model for each task (or you can specify)
- A senior developer partner to validate and extend ideas (
chat
) - A second opinion on complex architectural decisions - augment Claude's thinking with perspectives from Gemini Pro, O3, or dozens of other models via custom endpoints (
thinkdeep
) - Get multiple expert opinions - Have different AI models debate your ideas (some supporting, some critical) to help you make better decisions (
consensus
) - Professional code reviews with actionable feedback across entire repositories (
codereview
) - Pre-commit validation with deep analysis using the best model for the job (
precommit
) - Expert debugging - O3 for logical issues, Gemini for architectural problems (
debug
) - Extended context windows beyond Claude's limits - Delegate analysis to Gemini (1M tokens) or O3 (200K tokens) for entire codebases, large datasets, or comprehensive documentation
- Model-specific strengths - Extended thinking with Gemini Pro, fast iteration with Flash, strong reasoning with O3, local privacy with Ollama
- Local model support - Run models like Llama 3.2 locally via Ollama, vLLM, or LM Studio for privacy and cost control
- Dynamic collaboration - Models can request additional context and follow-up replies from Claude mid-analysis
- Smart file handling - Automatically expands directories, manages token limits based on model capacity
- Vision support - Analyze images, diagrams, screenshots, and visual content with vision-capable models
- Bypass MCP's token limits - Work around MCP's 25K limit automatically
- Context revival across sessions - Continue conversations even after Claude's context resets, with other models maintaining full history
This is an extremely powerful feature that cannot be highlighted enough:
The most amazing side-effect of this conversation continuation system is that even AFTER Claude's context resets or compacts, since the continuation info is kept within MCP's memory, you can ask it to continue discussing the plan with
o3
, and it will suddenly revive Claude because O3 would know what was being talked about and relay this back in a way that re-ignites Claude's understanding. All this without wasting context on asking Claude to ingest lengthy documents / code again and re-prompting it to communicate with another model. Zen manages that internally. The model's response revives Claude with better context around the discussion than an automatic summary ever can.
π Read the complete technical deep-dive on how this revolutionary system works
This server orchestrates multiple AI models as your development team, with Claude automatically selecting the best model for each task or allowing you to choose specific models for different strengths.
Prompt Used:
Study the code properly, think deeply about what this does and then see if there's any room for improvement in terms of performance optimizations, brainstorm with gemini on this to get feedback and then confirm any change by first adding a unit test with `measure` and measuring current code and then implementing the optimization and measuring again to ensure it improved, then share results. Check with gemini in between as you make tweaks.
The final implementation resulted in a 26% improvement in JSON parsing performance for the selected library, reducing processing time through targeted, collaborative optimizations guided by Geminiβs analysis and Claudeβs refinement.
- Python 3.10+ (3.12 recommended)
- Git
- Windows users: WSL2 is required for Claude Code CLI
Option A: OpenRouter (Access multiple models with one API)
- OpenRouter: Visit OpenRouter for access to multiple models through one API. Setup Guide
- Control model access and spending limits directly in your OpenRouter dashboard
- Configure model aliases in
conf/custom_models.json
Option B: Native APIs
- Gemini: Visit Google AI Studio and generate an API key. For best results with Gemini 2.5 Pro, use a paid API key as the free tier has limited access to the latest models.
- OpenAI: Visit OpenAI Platform to get an API key for O3 model access.
- X.AI: Visit X.AI Console to get an API key for GROK model access.
Option C: Custom API Endpoints (Local models like Ollama, vLLM) Please see the setup guide. With a custom API you can use:
- Ollama: Run models like Llama 3.2 locally for free inference
- vLLM: Self-hosted inference server for high-throughput inference
- LM Studio: Local model hosting with OpenAI-compatible API interface
- Text Generation WebUI: Popular local interface for running models
- Any OpenAI-compatible API: Custom endpoints for your own infrastructure
Note: Using all three options may create ambiguity about which provider / model to use if there is an overlap. If all APIs are configured, native APIs will take priority when there is a clash in model name, such as for
gemini
ando3
. Configure your model aliases and give them unique names inconf/custom_models.json
# Clone to your preferred location git clone https://github.com/BeehiveInnovations/zen-mcp-server.git cd zen-mcp-server # One-command setup installs Zen in Claude ./run-server.sh # To view MCP configuration for Claude ./run-server.sh -c # See help for more ./run-server.sh --help
What this does:
- Sets up everything automatically - Python environment, dependencies, configuration
- Configures Claude integrations - Adds to Claude Code CLI and guides Desktop setup
- Ready to use immediately - No manual configuration needed
After updates: Always run ./run-server.sh
again after git pull
to ensure everything stays current.
# Edit .env to add your API keys (if not already set in environment) nano .env # The file will contain, at least one should be set: # GEMINI_API_KEY=your-gemini-api-key-here # For Gemini models # OPENAI_API_KEY=your-openai-api-key-here # For O3 model # OPENROUTER_API_KEY=your-openrouter-key # For OpenRouter (see docs/custom_models.md) # For local models (Ollama, vLLM, etc.): # CUSTOM_API_URL=http://localhost:11434/v1 # Ollama example # CUSTOM_API_KEY= # Empty for Ollama # CUSTOM_MODEL_NAME=llama3.2 # Default model # Note: At least one API key OR custom URL is required
No restart needed: The server reads the .env file each time Claude calls a tool, so changes take effect immediately.
Next: Now run claude
from your project folder using the terminal for it to connect to the newly added mcp server. If you were already running a claude
code session, please exit and start a new session.
Need the exact configuration? Run ./run-server.sh -c
to display the platform-specific setup instructions with correct paths.
- Open Claude Desktop config: Settings β Developer β Edit Config
- Copy the configuration shown by
./run-server.sh -c
into yourclaude_desktop_config.json
- Restart Claude Desktop for changes to take effect
Just ask Claude naturally:
- "Think deeper about this architecture design with zen" β Claude picks best model +
thinkdeep
- "Using zen perform a code review of this code for security issues" β Claude might pick Gemini Pro +
codereview
- "Use zen and debug why this test is failing, the bug might be in my_class.swift" β Claude might pick O3 +
debug
- "With zen, analyze these files to understand the data flow" β Claude picks appropriate model +
analyze
- "Use flash to suggest how to format this code based on the specs mentioned in policy.md" β Uses Gemini Flash specifically
- "Think deeply about this and get o3 to debug this logic error I found in the checkOrders() function" β Uses O3 specifically
- "Brainstorm scaling strategies with pro. Study the code, pick your preferred strategy and debate with pro to settle on two best approaches" β Uses Gemini Pro specifically
- "Use local-llama to localize and add missing translations to this project" β Uses local Llama 3.2 via custom URL
- "First use local-llama for a quick local analysis, then use opus for a thorough security review" β Uses both providers in sequence
Quick Tool Selection Guide:
- Need a thinking partner? β
chat
(brainstorm ideas, get second opinions, validate approaches) - Need deeper thinking? β
thinkdeep
(extends analysis, finds edge cases) - Need to break down complex projects? β
planner
(step-by-step planning, project structure, breaking down complex ideas) - Need multiple perspectives? β
consensus
(get diverse expert opinions on proposals and decisions) - Code needs review? β
codereview
(bugs, security, performance issues) - Pre-commit validation? β
precommit
(validate git changes before committing) - Something's broken? β
debug
(root cause analysis, error tracing) - Want to understand code? β
analyze
(architecture, patterns, dependencies) - Code needs refactoring? β
refactor
(intelligent refactoring with decomposition focus) - Need call-flow analysis? β
tracer
(generates prompts for execution tracing and dependency mapping) - Need comprehensive tests? β
testgen
(generates test suites with edge cases) - Which models are available? β
listmodels
(shows all configured providers and models) - Server info? β
version
(version and configuration details)
Auto Mode: When DEFAULT_MODEL=auto
, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this".
Model Selection Examples:
- Complex architecture review β Claude picks Gemini Pro
- Quick formatting check β Claude picks Flash
- Logical debugging β Claude picks O3
- General explanations β Claude picks Flash for speed
- Local analysis β Claude picks your Ollama model
Pro Tip: Thinking modes (for Gemini models) control depth vs token cost. Use "minimal" or "low" for quick tasks, "high" or "max" for complex problems. Learn more
Tools Overview:
chat
- Collaborative thinking and development conversationsthinkdeep
- Extended reasoning and problem-solvingplanner
- Interactive sequential planning for complex projectsconsensus
- Multi-model consensus analysis with stance steeringcodereview
- Professional code review with severity levelsprecommit
- Validate git changes before committingdebug
- Root cause analysis and debugginganalyze
- General-purpose file and code analysisrefactor
- Code refactoring with decomposition focustracer
- Static code analysis prompt generator for call-flow mappingtestgen
- Comprehensive test generation with edge case coveragelistmodels
- Display all available AI models organized by providerversion
- Get server version and configuration
Your thinking partner for brainstorming, getting second opinions, and validating approaches. Perfect for technology comparisons, architecture discussions, and collaborative problem-solving.
Chat with zen about the best approach for user authentication in my React app
π Read More - Detailed features, examples, and best practices
Get a second opinion to augment Claude's own extended thinking. Uses specialized thinking models to challenge assumptions, identify edge cases, and provide alternative perspectives.
The button won't animate when clicked, it seems something else is intercepting the clicks. Use thinkdeep with gemini pro after gathering related code and handing it the files and find out what the root cause is
π Read More - Enhanced analysis capabilities and critical evaluation process
Break down complex projects or ideas into manageable, structured plans through step-by-step thinking. Perfect for adding new features to an existing system, scaling up system design, migration strategies, and architectural planning with branching and revision capabilities.
Claude supports sub-tasks
where it will spawn and run separate background tasks. You can ask Claude to run Zen's planner with two separate ideas. Then when it's done, use Zen's consensus
tool to pass the entire plan and get expert perspective from two powerful AI models on which one to work on first! Like performing AB testing in one-go without the wait!
Create two separate sub-tasks: in one, using planner tool show me how to add natural language support to my cooking app. In the other sub-task, use planner to plan how to add support for voice notes to my cooking app. Once done, start a consensus by sharing both plans to o3 and flash to give me the final verdict. Which one do I implement first?
π Read More - Step-by-step planning methodology and multi-session continuation
Get diverse expert opinions from multiple AI models on technical proposals and decisions. Supports stance steering (for/against/neutral) and structured decision-making.
Get a consensus with flash taking a supportive stance and gemini pro being critical to evaluate whether we should migrate from REST to GraphQL for our API. I need a definitive answer.
π Read More - Multi-model orchestration and decision analysis
Comprehensive code analysis with prioritized feedback and severity levels. Supports security reviews, performance analysis, and coding standards enforcement.
Perform a codereview with gemini pro especially the auth.py as I feel some of the code is bypassing security checks and there may be more potential vulnerabilities. Find and share related code."
π Read More - Professional review capabilities and parallel analysis
Comprehensive review of staged/unstaged git changes across multiple repositories. Validates changes against requirements and detects potential regressions.
Perform a thorough precommit with o3, we want to only highlight critical issues, no blockers, no regressions. I need 100% confidence this covers all the edge cases listed in FIX_ISSUE_ABC.md
- I asked claude (sonnet-4) to refactor some code to fix a known bug; gave it exact instructions
- The fix looked correct β and at a glance, so did the changes.
- More importantly, all existing unit tests passed (540+ unit tests) without a hitch. Zero regressions. Claude seemed confident and eager to commit. Everything looked perfect.
I then ran:
Run a precommit with o3 confirm our changes are sound and diffs are valid. Confirm this won't cause breakage or regressions and codesmells are out
This passed the tracked and untracked changes / diffs and the related code to o3 for a thorough review via Zen's precommit
tool.
Output:
βΊ Precommit Analysis Results - CRITICAL ISSUE FOUND The O3 model has identified one HIGH-severity regression and several medium/low issues in our refactor. Here's the summary: π¨ HIGH SEVERITY - MUST FIX BEFORE COMMIT ...
The reported issue was in fact a very subtle bug that slipped through the quick glance β and a unit test for this exact case apparently was missing (out of 540 existing tests!) - explains the zero reported regressions. The fix was ultimately simple, but the fact Claude (and by extension, I) overlooked this, was a stark reminder: no number of eyeballs is ever enough. Fixed the issue, ran precommit
with o3 again and got:
RECOMMENDATION: PROCEED WITH COMMIT
Nice!
π Read More - Multi-repository validation and change analysis
Root cause analysis for complex problems with systematic hypothesis generation. Supports error context, stack traces, and structured debugging approaches.
See logs under /Users/me/project/diagnostics.log and related code under the sync folder. Logs show that sync works but sometimes it gets stuck and there are no errors displayed to the user. Using zen's debug tool with gemini pro, find out why this is happening and what the root cause is and its fix
π Read More - Advanced debugging methodologies and troubleshooting
General-purpose code understanding and exploration. Supports architecture analysis, pattern detection, and comprehensive codebase exploration.
Use gemini to analyze main.py to understand how it works
π Read More - Code analysis types and exploration capabilities
Comprehensive refactoring analysis with top-down decomposition strategy. Prioritizes structural improvements and provides precise implementation guidance.
Use gemini pro to decompose my_crazy_big_class.m into smaller extensions
π Read More - Refactoring strategy and progressive analysis approach
Creates detailed analysis prompts for call-flow mapping and dependency tracing. Generates structured analysis requests for precision execution flow or dependency mapping.
Use zen tracer to analyze how UserAuthManager.authenticate is used and why
π Read More - Prompt generation and analysis modes
Generates thorough test suites with edge case coverage based on existing code and test framework. Uses multi-agent workflow for realistic failure mode analysis.
Use zen to generate tests for User.login() method
π Read More - Test generation strategy and framework support
Display all available AI models organized by provider, showing capabilities, context windows, and configuration status.
Use zen to list available models
π Read More - Model capabilities and configuration details
Get server version, configuration details, and system status for debugging and troubleshooting.
What version of zen do I have
π Read More - Server diagnostics and configuration verification
For detailed tool parameters and configuration options, see the Advanced Usage Guide.
Zen supports powerful structured prompts in Claude Code for quick access to tools and models:
/zen:chat ask local-llama what 2 + 2 is
- Use chat tool with auto-selected model/zen:thinkdeep use o3 and tell me why the code isn't working in sorting.swift
- Use thinkdeep tool with auto-selected model/zen:planner break down the microservices migration project into manageable steps
- Use planner tool with auto-selected model/zen:consensus use o3:for and flash:against and tell me if adding feature X is a good idea for the project. Pass them a summary of what it does.
- Use consensus tool with default configuration/zen:codereview review for security module ABC
- Use codereview tool with auto-selected model/zen:debug table view is not scrolling properly, very jittery, I suspect the code is in my_controller.m
- Use debug tool with auto-selected model/zen:analyze examine these files and tell me what if I'm using the CoreAudio framework properly
- Use analyze tool with auto-selected model
/zen:chat continue and ask gemini pro if framework B is better
- Continue previous conversation using chat tool
/zen:thinkdeeper check if the algorithm in @sort.py is performant and if there are alternatives we could explore
/zen:planner create a step-by-step plan for migrating our authentication system to OAuth2, including dependencies and rollback strategies
/zen:consensus debate whether we should migrate to GraphQL for our API
/zen:precommit confirm these changes match our requirements in COOL_FEATURE.md
/zen:testgen write me tests for class ABC
/zen:refactor propose a decomposition strategy, make a plan and save it in FIXES.md
The prompt format is: /zen:[tool] [your_message]
[tool]
- Any available tool name (chat, thinkdeep, planner, consensus, codereview, debug, analyze, etc.)[your_message]
- Your request, question, or instructions for the tool
Note: All prompts will show as "(MCP) [tool]" in Claude Code to indicate they're provided by the MCP server.
Want to create custom tools for your specific workflows?
The Zen MCP Server is designed to be extensible - you can easily add your own specialized tools for domain-specific tasks, custom analysis workflows, or integration with your favorite services.
See Complete Tool Development Guide - Step-by-step instructions for creating, testing, and integrating new tools
Your custom tools get the same benefits as built-in tools: multi-model support, conversation threading, token management, and automatic model selection.
This server enables true AI collaboration between Claude and multiple AI models, where they can coordinate and build on each other's insights across tools and conversations.
π Read More - Multi-model coordination, conversation threading, and collaborative workflows
Configure the Zen MCP Server through environment variables in your .env
file. Supports multiple AI providers, model restrictions, conversation settings, and advanced options.
# Quick start - Auto mode (recommended) DEFAULT_MODEL=auto GEMINI_API_KEY=your-gemini-key OPENAI_API_KEY=your-openai-key
Key Configuration Options:
- API Keys: Native APIs (Gemini, OpenAI, X.AI), OpenRouter, or Custom endpoints (Ollama, vLLM)
- Model Selection: Auto mode or specific model defaults
- Usage Restrictions: Control which models can be used for cost control
- Conversation Settings: Timeout, turn limits, memory configuration
- Thinking Modes: Token allocation for extended reasoning
- Logging: Debug levels and operational visibility
π Read More - Complete configuration reference with examples
For information on running tests, see the Testing Guide.
We welcome contributions! Please see our comprehensive guides:
- Contributing Guide - Code standards, PR process, and requirements
- Adding a New Provider - Step-by-step guide for adding AI providers
- Adding a New Tool - Step-by-step guide for creating new tools
Apache 2.0 License - see LICENSE file for details.
Built with the power of Multi-Model AI collaboration π€
- MCP (Model Context Protocol) by Anthropic
- Claude Code - Your AI coding assistant & orchestrator
- Gemini 2.5 Pro & 2.0 Flash - Extended thinking & fast analysis
- OpenAI O3 - Strong reasoning & general intelligence