Skip to content

Conversation

@johnbean393
Copy link

Description

This PR adds support for the new grok-4-fast model via OpenRouter through the bytebot-llm-proxy (LiteLLM).

Changes

  • Removed max_tokens parameter from proxy service Chat Completion requests
  • Removed reasoning_effort parameter from proxy service Chat Completion requests

These model-specific parameters were causing compatibility issues with the grok-4-fast model. By removing them, the proxy service now works seamlessly with grok-4-fast and other models that don't support these parameters, while LiteLLM handles model-specific parameter mapping automatically.

Testing

  • Verified that grok-4-fast model works correctly through the LiteLLM proxy
  • Confirmed backward compatibility with existing models
Remove max_tokens and reasoning_effort parameters from proxy service to improve compatibility with grok-4-fast model through OpenRouter. These model-specific parameters were causing issues with the new model.
- Add proxy.model-info.ts to dynamically fetch context windows from OpenRouter API - Update tasks.controller.ts to use async extractContextWindow function - Replace hardcoded 128K context window with dynamic values from OpenRouter - Implement caching layer (1-hour TTL) to minimize API calls - Fix Dockerfile to properly handle Prisma in Alpine Linux Benefits: - Grok 4 Fast now correctly reports 2M token context window - Claude Sonnet 4.5 reports 1M tokens instead of 200K - Gemini 2.5 models report 1048576 tokens - All models automatically get accurate, up-to-date context windows - Improves agent performance by preventing premature summarization Fixes context window inaccuracies by prioritizing: 1. LiteLLM model_info (when available) 2. OpenRouter API context_length (when LiteLLM returns null) 3. Default fallback (128K) Related to Grok 4 Fast support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant