A modern web application for comparing responses from different Large Language Models (LLMs) side-by-side. Compare OpenAI GPT models with Anthropic Claude, analyze performance metrics, and visualize differences with highlighting.
- π Side-by-Side Comparison: Compare responses from any two LLM models
- β‘ Real-Time Metrics: Track response time, token usage, and performance
- π¨ Intelligent Highlighting: Visual diff highlighting to spot differences at a glance
- π Multi-Provider Support: Works with OpenAI, Anthropic, and any OpenAI-compatible APIs
- π± Responsive Design: Beautiful, modern UI that works on desktop and mobile
- π Secure: API keys are never stored or transmitted to external servers
- βοΈ Configurable: Flexible endpoint and model configuration
Simply open the llm-diff-tool.html file in your web browser - no installation required!
# Clone the repository git clone https://github.com/yourusername/llm-diff-tool.git cd llm-diff-tool # Open in your browser open llm-diff-tool.html # or python -m http.server 8000 # Then visit http://localhost:8000-
Configure Your Models
- Enter API endpoints for both models
- Add your API keys (stored locally only)
- Specify model names (e.g.,
gpt-4,claude-3-sonnet-20240229)
-
Enter Your Prompt
- Type or paste the prompt you want both models to respond to
-
Compare
- Click "Compare Responses" to get results from both models
- View side-by-side responses with difference highlighting
- Analyze performance metrics and token usage
-
Toggle Features
- Enable/disable difference highlighting as needed
- Scroll through longer responses easily
Endpoint: https://api.openai.com/v1/chat/completions Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, etc. Endpoint: https://api.anthropic.com/v1/messages Models: claude-3-opus-20240229, claude-3-sonnet-20240229, etc. Any API that follows the OpenAI chat completions format:
Endpoint: http://localhost:8000/v1/chat/completions Models: llama-2-7b, mistral-7b, etc. - OpenAI: Get your API key from OpenAI Platform
- Anthropic: Get your API key from Anthropic Console
- Local Models: Configure according to your local setup
The tool sends requests with these default parameters:
max_tokens: 1000temperature: 0.7- Message format: OpenAI chat completions style
- Response Time: How long each model took to respond
- Prompt Tokens: Number of tokens in your input
- Completion Tokens: Number of tokens in the model's response
- Total Tokens: Combined token usage
- Model Names: For easy identification
The tool uses intelligent word-level comparison to highlight:
- π΄ Removed content: Text present in Model 1 but not Model 2
- π’ Added content: Text present in Model 2 but not Model 1
- βͺ Unchanged content: Text that's identical in both responses
Track and compare:
- Response latency
- Token efficiency
- Output length
- Model behavior differences
- No Data Storage: All comparisons happen locally in your browser
- No External Requests: API keys and responses never leave your device
- Direct API Calls: Connects directly to LLM providers, no intermediary servers
API Key Errors
- Ensure your API keys are valid and have sufficient credits
- Check that you're using the correct endpoint for each provider
CORS Errors
- Some browsers may block direct API calls
- Use a local server (like
python -m http.server) if needed
Response Format Issues
- Verify your model names are correct
- Ensure the API endpoint supports the chat completions format
Slow Performance
- Check your internet connection
- Some models may have longer response times
- Initial release
- OpenAI and Anthropic support
- Real-time difference highlighting
- Performance metrics tracking
- Responsive design
This project is licensed under the MIT License - see the LICENSE file for details.

