Introduction
Zero‑downtime deployments are a non‑negotiable expectation for modern services. As a DevOps lead, you’ll want a repeatable, auditable process that lets you push new code without dropping connections, while keeping observability tight. This checklist walks you through a Docker‑centric workflow that leverages Nginx as a reverse‑proxy, blue‑green releases, and CI/CD automation. Follow each item, and you’ll have a robust pipeline that ships features safely and scales gracefully.
✅ Pre‑flight Checklist
1. Container Baseline
- Base image: Use an official, minimal image (e.g.,
python:3.12-slim
ornode:20-alpine
). - Immutable layers: Pin exact versions of OS packages and runtime dependencies.
- Health checks: Define
HEALTHCHECK
in the Dockerfile so the orchestrator knows when a container is ready.
FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --production COPY . . HEALTHCHECK --interval=30s --timeout=5s \ CMD curl -f http://localhost:3000/health || exit 1 EXPOSE 3000 CMD ["node", "server.js"]
2. Nginx Configuration
- Upstream block: Point to two upstream groups –
blue
andgreen
. - Zero‑downtime switch: Use
proxy_pass
with a variable that you can reload vianginx -s reload
. - TLS termination: Offload SSL at Nginx to keep containers simple.
upstream blue { server 127.0.0.1:8001; } upstream green { server 127.0.0.1:8002; } map $http_x_deployment $backend { default blue; "green" green; } server { listen 80; listen 443 ssl; ssl_certificate /etc/nginx/certs/fullchain.pem; ssl_certificate_key /etc/nginx/certs/privkey.pem; location / { proxy_pass http://$backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } }
3. CI/CD Pipeline Foundations
- Branch strategy:
main
= production,release/*
= candidate. - Artifact versioning: Tag Docker images with Git SHA and semantic version (e.g.,
myapp:1.4.2‑a1b2c3
). - Pipeline stages: Lint → Test → Build → Push → Deploy.
🛠️ Deployment Checklist
4. Blue‑Green Infrastructure
- Spin up the green stack using Docker Compose or your orchestrator.
- Run smoke tests against the green endpoint (
http://localhost:8002/health
). - Flip the header (
X-Deployment: green
) or update the Nginx map variable. - Monitor for 5‑10 minutes; verify error rates, latency, and logs.
- Retire the blue stack once confidence is high.
# Example: bring up green stack docker compose -f docker-compose.green.yml up -d # Run health check curl -f http://localhost:8002/health && echo "✅ Green is healthy" # Reload Nginx to point traffic to green nginx -s reload
5. Zero‑Downtime Rollback
- Keep the previous version running until the new version passes all metrics.
- If a failure is detected, simply switch the Nginx map back to
blue
and scale the faulty green containers to0
.
6. Observability Hooks
- Metrics: Export Prometheus metrics from both Nginx (
nginx_exporter
) and the app. - Logs: Ship container stdout/stderr to a central log aggregator (e.g., Loki, Elasticsearch).
- Tracing: Enable OpenTelemetry in the app and forward spans to Jaeger.
- Alerting: Set alerts on
container_restart_total
andhttp_5xx_rate
.
7. Database Migration Safety
- Prefer online schema changes (e.g.,
pt-online-schema-change
for MySQL,pg_repack
for Postgres). - Run migrations in a separate CI step before traffic cut‑over.
- Keep migrations idempotent; use feature flags to guard new queries.
8. Security Hardening
- Store secrets in a vault (AWS Secrets Manager, HashiCorp Vault) and inject them at container start via environment variables.
- Enforce least‑privilege IAM roles for CI runners.
- Use Content‑Security‑Policy headers in Nginx to mitigate XSS.
add_header Content-Security-Policy "default-src 'self'; script-src 'self' https://cdn.jsdelivr.net";
9. Documentation & Runbooks
- Document the exact
docker compose
files for both environments. - Keep a runbook that lists:
- How to trigger a blue‑green switch manually.
- How to roll back.
- Where to find logs and metrics.
- Version‑control the runbook alongside the code.
📦 Post‑Deploy Validation
Metric | Target | Tool |
---|---|---|
5xx error rate | < 0.1% | Prometheus alert |
Avg latency | ≤ 200 ms | Grafana dashboard |
Container health | healthy for 5 min | Docker health check |
Log error count | ≤ 5 per hour | Loki query |
Run these checks automatically in the pipeline using a lightweight script:
#!/usr/bin/env bash set -e # Verify Nginx health endpoint if curl -sf http://localhost/healthz; then echo "✅ Nginx healthy" else echo "❌ Nginx unhealthy" && exit 1 fi # Verify app metrics if curl -sf http://localhost:9090/metrics | grep -q "http_requests_total"; then echo "✅ Metrics exposed" else echo "❌ Metrics missing" && exit 1 fi
🎉 Wrap‑Up
Zero‑downtime deployments aren’t magic; they’re the result of disciplined automation, clear observability, and a solid rollback plan. By ticking off each item in this checklist, you’ll reduce risk, keep users happy, and free your team to focus on building, not firefighting.
If you need help shipping this, the team at https://ramerlabs.com can help.
Top comments (0)