feat(prometheus): add queue depth metrics for deployments #17862

peterkc · 2025-12-12T06:02:51Z

Summary

Add Prometheus gauge metrics to monitor active and queued requests per deployment when using max_parallel_requests limiting.

New: TrackedSemaphore wrapper that tracks queue depth without accessing private asyncio.Semaphore internals
New: litellm_deployment_active_requests gauge metric
New: litellm_deployment_queued_requests gauge metric
New: Router.get_deployment_queue_stats() method

Problem

LiteLLM Router uses semaphores for max_parallel_requests limiting but provides no visibility into queue depth. Operators cannot monitor:

How many requests are actively being processed per deployment
How many requests are waiting in queue per deployment
Whether deployments are saturated

Solution

TrackedSemaphore

A wrapper around asyncio.Semaphore that explicitly tracks active and queued counts:

class TrackedSemaphore: async def acquire(self): self._queued += 1 await self._semaphore.acquire() self._queued -= 1 self._active += 1 def release(self): self._active -= 1 self._semaphore.release()

Prometheus Metrics

# Monitor saturated deployments litellm_deployment_queued_requests{model="gpt-4", model_group="production"} > 0 # Alert on queue buildup litellm_deployment_active_requests{model_group="production"}

Test Plan

Unit tests for TrackedSemaphore (16 tests)
Unit tests for Prometheus metric definitions (9 tests)
Performance tests validating minimal overhead (7 tests)
E2E tests for metrics endpoint (5 tests, requires running proxy)

Performance

Scenario	Overhead
Microbenchmark (no I/O)	~200%
Real-world (with I/O)	0.2%

The overhead is negligible because LLM API calls take 100ms-10s while semaphore operations take microseconds.

Closes #17764

Add Prometheus gauge metrics to monitor active and queued requests per deployment when using max_parallel_requests limiting. - Add TrackedSemaphore wrapper that tracks queue depth without accessing private asyncio.Semaphore internals - Add litellm_deployment_active_requests gauge metric - Add litellm_deployment_queued_requests gauge metric - Add Router.get_deployment_queue_stats() method - Add unit tests, performance tests, and e2e tests Closes BerriAI#17764

vercel · 2025-12-12T06:02:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Ready	Preview	Comment	Dec 12, 2025 6:05am

krrishdholakia · 2025-12-12T11:54:49Z

@AlexsanderHamir can you review this please? seems like there might be some perf impact

vercel bot deployed to Preview December 12, 2025 06:05 View deployment

peterkc mentioned this pull request Dec 12, 2025

[Feature]: Add prometheus metrics for tracking request queue depth #17764

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(prometheus): add queue depth metrics for deployments #17862

feat(prometheus): add queue depth metrics for deployments #17862

Uh oh!

peterkc commented Dec 12, 2025

vercel bot commented Dec 12, 2025 •

edited

Loading

krrishdholakia commented Dec 12, 2025

Labels

2 participants

Uh oh!

feat(prometheus): add queue depth metrics for deployments #17862

Are you sure you want to change the base?

feat(prometheus): add queue depth metrics for deployments #17862

Uh oh!

Conversation

peterkc commented Dec 12, 2025

Summary

Problem

Solution

TrackedSemaphore

Prometheus Metrics

Test Plan

Performance

vercel bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

krrishdholakia commented Dec 12, 2025

Labels

2 participants

vercel bot commented Dec 12, 2025 •

edited

Loading