Posted on Sep 26

Kubernetes Autoscaling with Custom Scaler: Event-Driven Scaling for Queues and Microservices – Part 2

#kubernetes #autoscaling #devops #cloud

In Part 1, we explored KEDA and how it scales your consumers based on queue depth.

But what if:

You have N queues → M consumers
Each queue has different thresholds, min, and max replicas
Each consumer has a different workflow / endpoint

KEDA alone can’t handle this. That’s where a custom autoscaler comes in.

Problem Statement

Example scenario:

Queue	Consumer Deployment	Threshold	Min Pods	Max Pods
x-queue-1	consumer-x-1	100	1	5
x-queue-2	consumer-x-2	50	2	6
y-queue	consumer-y	200	1	10

Each queue has its own processing logic.
Scaling decisions must be independent.
Min/Max replicas can be defined dynamically in a database for flexibility.

Architecture Overview

┌───────────────────────┐ │ Producers │ │ (Apps push messages │ │ to queues) │ └─────────┬─────────────┘ │ ▼ ┌─────────────────┐ │ Message Broker │ │ (Kafka/Rabbit) │ └─────────┬────────┘ │ ┌──────────┴──────────┐ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────┐ │ Python Exporter │ │ Prometheus Metrics │ │ - Reads queue depth │◄─────►│ storage │ │ - Reads min/max │ └─────────────────────┘ │ from DB │ │ - Applies scaling │ │ logic per queue │ │ - Calls Kubernetes │ │ API to scale │ └─────────┬───────────┘ │ ▼ ┌─────────────────────────────┐ │ Consumer Deployments │ │ - pod-deployment-1 │ │ - pod-deployment-2 │ │ - pod-deployment-y │ └─────────────────────────────┘

Python Example: Custom Scaling with DB Min/Max

from kubernetes import client, config import requests import sqlite3 # Example DB; replace with your real DB  # Load in-cluster config config.load_incluster_config() apps_v1 = client.AppsV1Api() # Connect to database containing min/max per consumer conn = sqlite3.connect('consumer_scaling.db') cursor = conn.cursor() # Read scaling config for all consumers cursor.execute("SELECT consumer_name, queue_name, threshold, min_pods, max_pods FROM scaling_config") scaling_rules = cursor.fetchall() # Prometheus endpoint prometheus_url = "http://prometheus:9090/api/v1/query" for consumer_name, queue_name, threshold, min_pods, max_pods in queue_config: # Query current queue depth  query = f'sum(queue_depth{{queue="{queue_name}"}})' resp = requests.get(prometheus_url, params={"query": query}).json() queue_depth = float(resp["data"]["result"][0]["value"][1]) # Calculate desired replicas using custom logic  desired_replicas = max(min_pods, min(max_pods, int(queue_depth / threshold))) # Scale Deployment  scale = client.V1Scale(spec=client.V1ScaleSpec(replicas=desired_replicas)) apps_v1.replace_namespaced_deployment_scale( name=consumer_name, namespace="default", body=scale ) print(f"{consumer_name}: queue={queue_depth}, threshold={threshold}, scaled to {desired_replicas} pods")

Notes

threshold is per queue.
min_pods and max_pods are read from a database, making it dynamic.
You can extend logic to include weighted scaling, multiple metrics, or cooldown periods.

Advantages of Custom Autoscaler

Independent scaling per queue
Dynamic min/max replicas from DB — no hardcoding
Multi-metric decisions (queue depth, CPU, DB lag, etc.)
Advanced logic (cooldowns, weighted scaling, prioritization)
Can handle N queues → M consumers mapping flexibly

Trade-offs

Feature	KEDA	Custom Autoscaler
Easy setup	✅	❌ (Python + DB + K8s API)
Independent queue scaling	❌	✅
Multi-metric logic	Limited	✅
DB-driven min/max	❌	✅
Reliability	✅ Battle-tested	⚠️ Managed by you

✅ Takeaways

KEDA is great for simple queue scaling.
For complex microservices with multiple queues, custom autoscaler gives full control.
Using a database for min/max replicas allows dynamic, production-ready scaling policies.
Your custom autoscaler can evolve into a custom HPA/KEDA tailored to your architecture.

💡 Pro Tip:

Start with KEDA for simple cases. Move to a custom autoscaler with DB-defined min/max for multi-queue microservices with complex workflows.