The ultimate batch API client for your LLM workloads. It load-balances across endpoints, retries intelligently, and processes 10,000+ requests per second on a laptop.
Blaze API is the batch processor your LLM workloads deserve. Stop writing brittle Python scripts that crash at 100 req/sec. This tool acts like a fleet of pro API consumers, intelligently distributing requests across endpoints, handling failures gracefully, and maxing out your API capacity without breaking a sweat.
| Blazing Fast 10K+ req/sec on 8 cores | Smart Load Balancing Weighted distribution across endpoints | Auto Retry Exponential backoff with jitter | Real-time Stats Progress, RPS, latency tracking |
How it slaps:
- You:
blaze -i requests.jsonl -o results.jsonl - Blaze: Load balances, retries failures, tracks progress, writes results.
- You: Go grab a coffee while 100K requests complete. ☕
- Result: Perfectly formatted JSONL with every response. Zero babysitting.
Manually scripting API requests is a vibe-killer. Blaze makes other methods look ancient.
| ❌ The Old Way (Pain) | ✅ The Blaze Way (Glory) |
|
|
We're not just sending requests. We're building a high-throughput, fault-tolerant pipeline with weighted load balancing, connection pooling, and intelligent retry logic that actually respects your API provider's limits.
| Platform | Method | Command |
|---|---|---|
| 🦀 All | Cargo | cargo install blaze-api |
| 🍎 macOS | Homebrew | brew install yigitkonur/tap/blaze |
| 🐧 Linux | Binary | See releases |
| 🪟 Windows | Binary | See releases |
# Clone and build git clone https://github.com/yigitkonur/blaze-api.git cd blaze-api cargo build --release # Binary is at ./target/release/blazecargo install blaze-api✨ Zero Config: After installation,
blazeis ready to go. Just point it at your JSONL file!
The workflow is dead simple.
# Process requests and save results blaze --input requests.jsonl --output results.jsonl # Short flags work too blaze -i requests.jsonl -o results.jsonl # High-throughput mode (10K req/sec) blaze -i data.jsonl -o out.jsonl --rate 10000 --workers 200# Use a config file for multiple endpoints blaze -i requests.jsonl -o results.jsonl --config endpoints.json # Or set via environment export BLAZE_ENDPOINT_URL="https://api.openai.com/v1/completions" export BLAZE_API_KEY="sk-..." export BLAZE_MODEL="gpt-4" blaze -i requests.jsonl -o results.jsonlYour requests.jsonl file should have one JSON object per line:
{"input": "What is the capital of France?"} {"input": "Explain quantum computing in simple terms."} {"input": "Write a haiku about Rust programming."}Or with custom request bodies:
{"body": {"messages": [{"role": "user", "content": "Hello!"}], "model": "gpt-4"}} {"body": {"messages": [{"role": "system", "content": "You are helpful."}, {"role": "user", "content": "Hi!"}]}}Results are written as JSONL:
{"input": "What is the capital of France?", "response": {"choices": [...]}, "metadata": {"endpoint": "...", "latency_ms": 234, "attempts": 1}} {"input": "Explain quantum computing...", "response": {"choices": [...]}, "metadata": {"endpoint": "...", "latency_ms": 189, "attempts": 1}}Errors go to errors.jsonl:
{"input": "...", "error": "HTTP 429: Rate limit exceeded", "status_code": 429, "attempts": 3}| Feature | What It Does | Why You Care |
|---|---|---|
⚡ Async EverythingTokio runtime | Non-blocking I/O with work-stealing scheduler | Saturates your CPU cores efficiently |
🎯 Weighted Load BalancingSmart distribution | Route traffic based on endpoint capacity | Max out multiple API keys simultaneously |
🔄 Exponential BackoffWith jitter | Intelligent retry with randomized delays | Respects rate limits, avoids thundering herd |
📊 Real-time ProgressLive stats | RPS, success rate, latency, ETA | Know exactly what's happening |
🔌 Connection PoolingHTTP/2 keep-alive | Reuses connections across requests | Eliminates TCP handshake overhead |
💾 Streaming OutputImmediate writes | Results written as they complete | Never lose progress on crashes |
🏥 Health TrackingPer-endpoint | Automatic failover on errors | Unhealthy endpoints get cooled off |
🔧 Flexible ConfigCLI + ENV + JSON | Configure via args, env vars, or files | Fits any workflow |
USAGE: blaze [OPTIONS] --input <FILE> OPTIONS: -i, --input <FILE> Path to JSONL input file [env: BLAZE_INPUT] -o, --output <FILE> Path for successful responses [env: BLAZE_OUTPUT] -e, --errors <FILE> Path for error responses [default: errors.jsonl] -r, --rate <N> Max requests per second [default: 1000] -w, --workers <N> Concurrent workers [default: 50] -t, --timeout <SECS> Request timeout [default: 30] -a, --max-attempts <N> Max retry attempts [default: 3] -c, --config <FILE> Endpoint config file (JSON) -v, --verbose Enable debug logging --json-logs Output logs as JSON --no-progress Disable progress bar --dry-run Validate config without processing -h, --help Print help -V, --version Print version All options can be set via environment variables with BLAZE_ prefix:
export BLAZE_INPUT="requests.jsonl" export BLAZE_OUTPUT="results.jsonl" export BLAZE_RATE="5000" export BLAZE_WORKERS="100" export BLAZE_ENDPOINT_URL="https://api.example.com/v1/completions" export BLAZE_API_KEY="your-api-key" export BLAZE_MODEL="gpt-4"For multiple endpoints, create endpoints.json:
{ "endpoints": [ { "url": "https://api.openai.com/v1/completions", "weight": 2, "api_key": "sk-key-1", "model": "gpt-4", "max_concurrent": 100 }, { "url": "https://api.openai.com/v1/completions", "weight": 1, "api_key": "sk-key-2", "model": "gpt-4", "max_concurrent": 50 } ], "request": { "timeout": "30s", "rate_limit": 5000, "workers": 100 }, "retry": { "max_attempts": 3, "initial_backoff": "100ms", "max_backoff": "10s", "multiplier": 2.0 } }Then run:
blaze -i requests.jsonl -o results.jsonl --config endpoints.json# For maximum speed (adjust based on your API limits) blaze -i data.jsonl -o out.jsonl \ --rate 10000 \ --workers 200 \ --timeout 60{ "endpoints": [ {"url": "...", "api_key": "key-1", "weight": 3, "max_concurrent": 150}, {"url": "...", "api_key": "key-2", "weight": 2, "max_concurrent": 100}, {"url": "...", "api_key": "key-3", "weight": 1, "max_concurrent": 50} ] }{ "retry": { "max_attempts": 5, "initial_backoff": "500ms", "max_backoff": "30s", "multiplier": 2.0 } }git clone https://github.com/yigitkonur/blaze-api.git cd blaze-api # Debug build cargo build # Release build (optimized) cargo build --release # Run tests cargo test # Run benchmarks cargo benchuse blaze_api::{Config, EndpointConfig, Processor}; #[tokio::main] async fn main() -> anyhow::Result<()> { let config = Config { endpoints: vec![EndpointConfig { url: "https://api.example.com/v1/completions".to_string(), weight: 1, api_key: Some("your-key".to_string()), model: Some("gpt-4".to_string()), max_concurrent: 100, }], ..Default::default() }; let processor = Processor::new(config)?; let result = processor.process_file( "requests.jsonl".into(), Some("results.jsonl".into()), "errors.jsonl".into(), true, ).await?; result.print_summary(); Ok(()) }src/ ├── lib.rs # Library entry point ├── main.rs # CLI binary ├── config.rs # Configuration management ├── client.rs # HTTP client with retry logic ├── endpoint.rs # Load balancer implementation ├── processor.rs # Main processing orchestration ├── request.rs # Request/response types ├── tracker.rs # Statistics tracking └── error.rs # Error types Expand for troubleshooting tips
| Problem | Solution |
|---|---|
| "Too many open files" | Increase ulimit: ulimit -n 65535 |
| Connection timeouts | Increase --timeout or reduce --workers |
| Rate limit errors (429) | Lower --rate or add more API keys |
| Memory usage high | Reduce --workers for large requests |
| Progress bar not showing | Don't pipe output, or use --no-progress --json-logs |
Build Issues:
| Problem | Solution |
|---|---|
| OpenSSL errors | Install OpenSSL dev: apt install libssl-dev or use --features rustls |
| Rust version error | Update Rust: rustup update stable (requires 1.75+) |
Contributions are welcome! Please feel free to submit a Pull Request.
# Fork the repo, then: git clone https://github.com/YOUR_USERNAME/blaze-api.git cd blaze-api cargo test # Make your changes cargo fmt cargo clippy cargo test # Submit PRMIT © Yiğit Konur
Built with 🔥 because waiting for API responses is a soul-crushing waste of time.