Skip to content

Optimize uWSGI PSGI Configuration for K8s Deployment #146

@ranguard

Description

@ranguard

Performance: Optimize uWSGI PSGI Configuration for K8s Deployment

NOTE: This is generated from Claude.ai being given our current config and being told to add create this ticket.

Summary

Need to analyze and optimize current uWSGI configuration to reduce slow responses and improve overall performance in our Kubernetes containerized environment.

Current Configuration

[uwsgi] master = true workers = 20 die-on-term = true need-app = true vacuum = true disable-logging = true listen = 1024 post-buffering = 4096 buffer-size = 65535 early-psgi = true perl-no-die-catch = true max-worker-lifetime = 3600 max-requests = 1000 reload-on-rss = 300 harakiri = 60

Performance Issues

  • Slow response times observed
  • Need data-driven optimization approach
  • Current settings may not be optimal for our workload

Action Items (Priority Order)

1. Enable Performance Monitoring (CRITICAL - Do First)

Why: Need baseline metrics before making changes

Implementation:

  • Add stats socket to uWSGI config:
    stats = 127.0.0.1:9191 stats-http = true
  • Deploy and verify stats endpoint accessibility
  • Install uwsgitop for monitoring: pip install uwsgitop

Success Criteria: Can access uWSGI stats via curl http://localhost:9191

2. Collect Baseline Performance Data (HIGH)

Why: Establish current performance patterns before optimization

Tasks:

  • Monitor for 24-48 hours and collect:
    • Worker CPU/memory usage: kubectl top pods
    • uWSGI worker stats: uwsgitop 127.0.0.1:9191
    • Container resource limits vs usage
    • Response time patterns
    • Worker restart frequency

Success Criteria: Have documented baseline metrics

3. Enable Targeted Logging (HIGH)

Why: Identify specific bottlenecks without overwhelming logs

Implementation:

  • Temporarily replace disable-logging = true with:
    log-slow = 1000 log-4xx = true log-5xx = true logto = /tmp/uwsgi.log

Success Criteria: Can identify slow requests and error patterns

4. Right-size Worker Count (MEDIUM-HIGH)

Why: Most impactful setting - too many workers can hurt performance

Analysis needed:

  • Check container CPU allocation
  • Monitor CPU usage per worker
  • Test worker count = (2 × CPU cores) + 1 as starting point

Implementation:

  • If 4 CPU cores allocated, try workers = 9
  • Monitor performance impact
  • Adjust based on CPU utilization patterns

Success Criteria: Workers fully utilize available CPU without thrashing

5. Optimize Memory Settings (MEDIUM)

Why: Frequent worker restarts hurt performance

Current issues:

  • reload-on-rss = 300 may be too aggressive
  • max-worker-lifetime = 3600 + max-requests = 1000 causes frequent restarts

Tasks:

  • Monitor actual worker RSS usage
  • Test increased settings:
    reload-on-rss = 512 max-requests = 5000 max-worker-lifetime = 7200

Success Criteria: Reduced worker restart frequency

6. Tune Request Handling (MEDIUM)

Why: Better request processing and timeout handling

Implementation:

  • Analyze typical request duration
  • Adjust harakiri = 120 if requests legitimately take >60s
  • Consider increasing listen = 2048 if seeing connection drops

7. Optimize Buffer Settings (LOW-MEDIUM)

Why: Current buffer-size = 65535 may be oversized

Tasks:

  • Monitor buffer utilization in stats
  • Test with buffer-size = 32768
  • Adjust based on actual usage patterns

8. Load Testing & Validation (LOW)

Why: Validate improvements under controlled conditions

Implementation:

  • Set up load testing environment
  • Compare before/after metrics
  • Document optimal configuration

Monitoring Commands

# uWSGI stats curl http://localhost:9191 # Resource usage kubectl top pods <pod-name> kubectl describe pod <pod-name> # Worker memory kubectl exec <pod-name> -- ps aux | grep uwsgi # Connection monitoring  kubectl exec <pod-name> -- netstat -an | grep :80 | sort | uniq -c

Red Flags to Watch For

  • High worker churn (frequent restarts)
  • Listen queue overflows in stats
  • Consistent high CPU with low throughput
  • Memory usage steadily climbing
  • Regular harakiri timeouts

Success Metrics

  • Reduced 95th percentile response time
  • Decreased worker restart frequency
  • Improved resource utilization efficiency
  • Fewer timeout errors

Notes

  • Make changes incrementally
  • Monitor each change for 24+ hours before next adjustment
  • Keep rollback plan ready
  • Document all changes and their impact

Priority: High
Estimate: 1-2 sprints
Dependencies: Monitoring tools, load testing capability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions