Posted on Feb 21

Scaling Rails Background Jobs in Kubernetes: From Queue to HPA

Ever tried processing a million records in a Rails controller action? Yeah, that's not going to end well. Your users will be staring at a spinning wheel, your server will be gasping for resources, and your ops team will be giving you that "we need to talk" look.

The Problem: Long-Running Requests

Picture this: Your Rails app needs to:

Generate complex reports from millions of records
Process large file uploads
Send thousands of notifications
Sync data with external systems

Doing any of these in a controller action means:

Timeout issues (Nginx, Rails, Load Balancer)
Blocked server resources
Poor user experience
Potential data inconsistency if the request fails

Step 1: Moving to Background Processing

First, let's move these long-running tasks to background jobs:

# app/controllers/reports_controller.rb class ReportsController < ApplicationController def create report_id = SecureRandom.uuid ReportGenerationJob.perform_later( user_id: current_user.id, report_id: report_id, parameters: report_params ) render json: { report_id: report_id, status: 'processing', status_url: report_status_path(report_id) } end end # app/jobs/report_generation_job.rb class ReportGenerationJob < ApplicationJob queue_as :reports def perform(user_id:, report_id:, parameters:) # Process report report_data = generate_report(parameters) # Store results store_report(report_id, report_data) # Notify user ReportMailer.completed(user_id, report_id).deliver_now end end

Great! Now our users get immediate feedback, and our server isn't blocked. But we've just moved the problem - now it's in our job queue.

The Scaling Challenge

A single Rails worker instance with Sidekiq needs proper configuration for queues and concurrency. Here's a basic setup:

# config/initializers/sidekiq.rb # Note: This is a simplified example to demonstrate the concept. # Actual syntax might vary based on your Sidekiq version and requirements. Sidekiq.configure_server do |config| # Configure Redis connection config.redis = { url: ENV.fetch('REDIS_URL', 'redis://localhost:6379/0') } # Configure concurrency based on environment config.options[:concurrency] = case Rails.env when 'production' ENV.fetch('SIDEKIQ_CONCURRENCY', 25).to_i else 10 end end # Queue configuration with weights for priority config.options[:queues] = [ ['critical', 5], # Higher weight = higher priority ['sequential', 3], ['default', 2], ['low', 1] ]

And in your config/sidekiq.yml:

# Note: This is a simplified example. Adjust based on your needs :verbose: false :concurrency: <%= ENV.fetch("SIDEKIQ_CONCURRENCY", 25) %> :timeout: 25 # Environment-specific configurations production: :concurrency: <%= ENV.fetch("SIDEKIQ_CONCURRENCY", 25) %> :queues: - [critical, 5] - [sequential, 3] - [default, 2] - [low, 1]

This gives us 25 concurrent jobs in production, but what happens when:

We have 1000 reports queued up
Some jobs need to run sequentially (like financial transactions)
Different jobs need different resources
We have mixed workloads (quick jobs vs long-running jobs)

Queue Strategy: Not All Jobs Are Equal

Let's organize our jobs based on their processing requirements:

class FinancialTransactionJob < ApplicationJob queue_as :sequential sidekiq_options retry: 3, backtrace: true def perform(transaction_id) # Must process one at a time process_transaction(transaction_id) end end class ReportGenerationJob < ApplicationJob queue_as :default sidekiq_options retry: 5, backtrace: true def perform(report_id) # Can process many simultaneously generate_report(report_id) end end class NotificationJob < ApplicationJob queue_as :low sidekiq_options retry: 3 def perform(user_ids) # Quick jobs, high volume send_notifications(user_ids) end end

Enter Kubernetes HPA: Dynamic Worker Scaling

Now we can set up our worker deployment and HPA:

apiVersion: apps/v1 kind: Deployment metadata: name: rails-workers spec: template: spec: containers: - name: sidekiq image: myapp/rails:latest command: ["bundle", "exec", "sidekiq"] env: - name: RAILS_ENV value: "production" - name: SIDEKIQ_CONCURRENCY value: "25" resources: requests: memory: "1Gi" cpu: "1" limits: memory: "2Gi" cpu: "2" --- apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: rails-workers spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: rails-workers minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metric: name: sidekiq_queue_depth target: type: AverageValue averageValue: 100 behavior: scaleUp: stabilizationWindowSeconds: 60 policies: - type: Pods value: 2 periodSeconds: 60 scaleDown: stabilizationWindowSeconds: 300 policies: - type: Pods value: 1 periodSeconds: 60

This setup gives us:

Minimum 2 worker pods (50 concurrent jobs)
Maximum 10 worker pods (250 concurrent jobs)
Automatic scaling based on queue depth
Conservative scale-down to prevent thrashing
Resource limits to protect our cluster

Monitoring and Fine-Tuning

To make this work smoothly, monitor:

Queue depths by queue type
Job processing times
Error rates
Resource utilization

Add Prometheus metrics:

# config/initializers/sidekiq.rb Sidekiq.configure_server do |config| config.on(:startup) do queue_depth_gauge = Prometheus::Client.registry.gauge( :sidekiq_queue_depth, docstring: 'Sidekiq queue depth', labels: [:queue] ) Sidekiq::Scheduled::Poller.new.poll do Sidekiq::Queue.all.each do |queue| queue_depth_gauge.set( { queue: queue.name }, queue.size ) end end end end

Best Practices and Gotchas

Queue Isolation
- Separate queues for different job types
- Consider dedicated workers for critical queues
- Use queue priorities effectively
Resource Management
- Set appropriate memory/CPU limits
- Monitor job memory usage
- Use batch processing for large datasets
Error Handling
- Implement retry strategies
- Set up dead letter queues
- Monitor failed jobs
Scaling Behavior
- Set appropriate scaling thresholds
- Use stabilization windows
- Consider time-of-day patterns

Conclusion

By combining Rails' background job capabilities with Kubernetes' scaling features, we can build a robust, scalable system for processing long-running tasks. The key is to:

Move long-running tasks to background jobs
Organize queues based on job characteristics
Configure worker processes appropriately
Use HPA for dynamic scaling
Monitor and adjust based on real-world usage

Remember: The goal isn't just to scale - it's to provide a reliable, responsive system that efficiently processes work while maintaining data consistency and user experience.