In Part 1, we explored KEDA and how it scales your consumers based on queue depth.
But what if:
- You have N queues → M consumers
- Each queue has different thresholds, min, and max replicas
- Each consumer has a different workflow / endpoint
KEDA alone can’t handle this. That’s where a custom autoscaler comes in.
Problem Statement
Example scenario:
| Queue | Consumer Deployment | Threshold | Min Pods | Max Pods |
|---|---|---|---|---|
| x-queue-1 | consumer-x-1 | 100 | 1 | 5 |
| x-queue-2 | consumer-x-2 | 50 | 2 | 6 |
| y-queue | consumer-y | 200 | 1 | 10 |
- Each queue has its own processing logic.
- Scaling decisions must be independent.
- Min/Max replicas can be defined dynamically in a database for flexibility.
Architecture Overview
┌───────────────────────┐ │ Producers │ │ (Apps push messages │ │ to queues) │ └─────────┬─────────────┘ │ ▼ ┌─────────────────┐ │ Message Broker │ │ (Kafka/Rabbit) │ └─────────┬────────┘ │ ┌──────────┴──────────┐ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────┐ │ Python Exporter │ │ Prometheus Metrics │ │ - Reads queue depth │◄─────►│ storage │ │ - Reads min/max │ └─────────────────────┘ │ from DB │ │ - Applies scaling │ │ logic per queue │ │ - Calls Kubernetes │ │ API to scale │ └─────────┬───────────┘ │ ▼ ┌─────────────────────────────┐ │ Consumer Deployments │ │ - pod-deployment-1 │ │ - pod-deployment-2 │ │ - pod-deployment-y │ └─────────────────────────────┘ Python Example: Custom Scaling with DB Min/Max
from kubernetes import client, config import requests import sqlite3 # Example DB; replace with your real DB # Load in-cluster config config.load_incluster_config() apps_v1 = client.AppsV1Api() # Connect to database containing min/max per consumer conn = sqlite3.connect('consumer_scaling.db') cursor = conn.cursor() # Read scaling config for all consumers cursor.execute("SELECT consumer_name, queue_name, threshold, min_pods, max_pods FROM scaling_config") scaling_rules = cursor.fetchall() # Prometheus endpoint prometheus_url = "http://prometheus:9090/api/v1/query" for consumer_name, queue_name, threshold, min_pods, max_pods in queue_config: # Query current queue depth query = f'sum(queue_depth{{queue="{queue_name}"}})' resp = requests.get(prometheus_url, params={"query": query}).json() queue_depth = float(resp["data"]["result"][0]["value"][1]) # Calculate desired replicas using custom logic desired_replicas = max(min_pods, min(max_pods, int(queue_depth / threshold))) # Scale Deployment scale = client.V1Scale(spec=client.V1ScaleSpec(replicas=desired_replicas)) apps_v1.replace_namespaced_deployment_scale( name=consumer_name, namespace="default", body=scale ) print(f"{consumer_name}: queue={queue_depth}, threshold={threshold}, scaled to {desired_replicas} pods") Notes
-
thresholdis per queue. -
min_podsandmax_podsare read from a database, making it dynamic. - You can extend logic to include weighted scaling, multiple metrics, or cooldown periods.
Advantages of Custom Autoscaler
- Independent scaling per queue
- Dynamic min/max replicas from DB — no hardcoding
- Multi-metric decisions (queue depth, CPU, DB lag, etc.)
- Advanced logic (cooldowns, weighted scaling, prioritization)
- Can handle N queues → M consumers mapping flexibly
Trade-offs
| Feature | KEDA | Custom Autoscaler |
|---|---|---|
| Easy setup | ✅ | ❌ (Python + DB + K8s API) |
| Independent queue scaling | ❌ | ✅ |
| Multi-metric logic | Limited | ✅ |
| DB-driven min/max | ❌ | ✅ |
| Reliability | ✅ Battle-tested | ⚠️ Managed by you |
✅ Takeaways
- KEDA is great for simple queue scaling.
- For complex microservices with multiple queues, custom autoscaler gives full control.
- Using a database for min/max replicas allows dynamic, production-ready scaling policies.
- Your custom autoscaler can evolve into a custom HPA/KEDA tailored to your architecture.
💡 Pro Tip:
Start with KEDA for simple cases. Move to a custom autoscaler with DB-defined min/max for multi-queue microservices with complex workflows.
Top comments (0)