Posted on Sep 25

The Ultimate Checklist for Zero‑Downtime Deploys with Docker & Nginx

Introduction

Zero‑downtime deployments are a non‑negotiable expectation for modern services. As a DevOps lead, you’ll want a repeatable, auditable process that lets you push new code without dropping connections, while keeping observability tight. This checklist walks you through a Docker‑centric workflow that leverages Nginx as a reverse‑proxy, blue‑green releases, and CI/CD automation. Follow each item, and you’ll have a robust pipeline that ships features safely and scales gracefully.

✅ Pre‑flight Checklist

1. Container Baseline

Base image: Use an official, minimal image (e.g., python:3.12-slim or node:20-alpine).
Immutable layers: Pin exact versions of OS packages and runtime dependencies.
Health checks: Define HEALTHCHECK in the Dockerfile so the orchestrator knows when a container is ready.

FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --production COPY . . HEALTHCHECK --interval=30s --timeout=5s \ CMD curl -f http://localhost:3000/health || exit 1 EXPOSE 3000 CMD ["node", "server.js"]

2. Nginx Configuration

Upstream block: Point to two upstream groups – blue and green.
Zero‑downtime switch: Use proxy_pass with a variable that you can reload via nginx -s reload.
TLS termination: Offload SSL at Nginx to keep containers simple.

upstream blue { server 127.0.0.1:8001; } upstream green { server 127.0.0.1:8002; } map $http_x_deployment $backend { default blue; "green" green; } server { listen 80; listen 443 ssl; ssl_certificate /etc/nginx/certs/fullchain.pem; ssl_certificate_key /etc/nginx/certs/privkey.pem; location / { proxy_pass http://$backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } }

3. CI/CD Pipeline Foundations

Branch strategy: main = production, release/* = candidate.
Artifact versioning: Tag Docker images with Git SHA and semantic version (e.g., myapp:1.4.2‑a1b2c3).
Pipeline stages: Lint → Test → Build → Push → Deploy.

🛠️ Deployment Checklist

4. Blue‑Green Infrastructure

Spin up the green stack using Docker Compose or your orchestrator.
Run smoke tests against the green endpoint (http://localhost:8002/health).
Flip the header (X-Deployment: green) or update the Nginx map variable.
Monitor for 5‑10 minutes; verify error rates, latency, and logs.
Retire the blue stack once confidence is high.

# Example: bring up green stack docker compose -f docker-compose.green.yml up -d # Run health check curl -f http://localhost:8002/health && echo "✅ Green is healthy" # Reload Nginx to point traffic to green nginx -s reload

5. Zero‑Downtime Rollback

Keep the previous version running until the new version passes all metrics.
If a failure is detected, simply switch the Nginx map back to blue and scale the faulty green containers to 0.

6. Observability Hooks

Metrics: Export Prometheus metrics from both Nginx (nginx_exporter) and the app.
Logs: Ship container stdout/stderr to a central log aggregator (e.g., Loki, Elasticsearch).
Tracing: Enable OpenTelemetry in the app and forward spans to Jaeger.
Alerting: Set alerts on container_restart_total and http_5xx_rate.

7. Database Migration Safety

Prefer online schema changes (e.g., pt-online-schema-change for MySQL, pg_repack for Postgres).
Run migrations in a separate CI step before traffic cut‑over.
Keep migrations idempotent; use feature flags to guard new queries.

8. Security Hardening

Store secrets in a vault (AWS Secrets Manager, HashiCorp Vault) and inject them at container start via environment variables.
Enforce least‑privilege IAM roles for CI runners.
Use Content‑Security‑Policy headers in Nginx to mitigate XSS.

add_header Content-Security-Policy "default-src 'self'; script-src 'self' https://cdn.jsdelivr.net";

9. Documentation & Runbooks

Document the exact docker compose files for both environments.
Keep a runbook that lists:
- How to trigger a blue‑green switch manually.
- How to roll back.
- Where to find logs and metrics.
Version‑control the runbook alongside the code.

📦 Post‑Deploy Validation

Metric	Target	Tool
5xx error rate	< 0.1%	Prometheus alert
Avg latency	≤ 200 ms	Grafana dashboard
Container health	`healthy` for 5 min	Docker health check
Log error count	≤ 5 per hour	Loki query

Run these checks automatically in the pipeline using a lightweight script:

#!/usr/bin/env bash set -e # Verify Nginx health endpoint if curl -sf http://localhost/healthz; then echo "✅ Nginx healthy" else echo "❌ Nginx unhealthy" && exit 1 fi # Verify app metrics if curl -sf http://localhost:9090/metrics | grep -q "http_requests_total"; then echo "✅ Metrics exposed" else echo "❌ Metrics missing" && exit 1 fi

🎉 Wrap‑Up

Zero‑downtime deployments aren’t magic; they’re the result of disciplined automation, clear observability, and a solid rollback plan. By ticking off each item in this checklist, you’ll reduce risk, keep users happy, and free your team to focus on building, not firefighting.

If you need help shipping this, the team at https://ramerlabs.com can help.

DEV Community