Posted on Sep 25

The Ultimate Checklist for Zero‑Downtime Deploys with Docker & Nginx

Introduction

As a DevOps lead, you know that every second of downtime can translate into lost revenue, frustrated users, and tarnished brand reputation. Modern micro‑service stacks make it possible to push updates without taking the whole system offline, but the process still requires a disciplined approach. This checklist walks you through a practical, end‑to‑end workflow for zero‑downtime deployments using Docker containers behind an Nginx reverse proxy. It’s written for teams that already have a CI/CD pipeline in place and want to tighten the safety net around production releases.

1. Prepare a Reproducible Docker Image

Pin base images – Use a specific tag (e.g., python:3.11-slim) instead of latest.
Multi‑stage builds – Strip out build‑time dependencies to keep the runtime image lean.
Health checks – Declare a HEALTHCHECK instruction so Docker can report container health to the orchestrator.

FROM node:20-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci && npm run build FROM node:20-alpine WORKDIR /app COPY --from=builder /app/dist ./dist COPY package*.json ./ RUN npm ci --production HEALTHCHECK --interval=30s --timeout=5s \ CMD curl -f http://localhost:3000/health || exit 1

Why it matters: A deterministic image eliminates “it works on my machine” bugs, and health checks give Nginx a reliable way to route traffic only to healthy containers.

2. Version Your Deployments

Semantic version tags – Tag Docker images with vMAJOR.MINOR.PATCH (e.g., myapp:1.4.2).
Immutable releases – Never overwrite an existing tag; push a new image for every change.
Registry promotion – Promote images from a staging repository to production only after automated tests pass.

# Build and push a versioned image docker build -t registry.example.com/myapp:1.4.2 . docker push registry.example.com/myapp:1.4.2

Versioning gives you a clear rollback path and makes audit trails easier for compliance.

3. Blue‑Green Architecture with Nginx

The classic blue‑green pattern runs two identical environments (blue = current, green = next). Nginx acts as the traffic switcher.

3.1 Nginx Upstream Configuration

upstream myapp { # Blue (current) pool server 10.0.1.10:3000 max_fails=3 fail_timeout=30s; server 10.0.1.11:3000 max_fails=3 fail_timeout=30s; # Green (new) pool – comment out until ready # server 10.0.2.10:3000 max_fails=3 fail_timeout=30s; # server 10.0.2.11:3000 max_fails=3 fail_timeout=30s; } server { listen 80; location / { proxy_pass http://myapp; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } }

3.2 Switching Traffic

Deploy the new Docker image to the green hosts.
Verify health endpoints (/health) return 200.
Uncomment the green servers in the upstream block and reload Nginx:

 sudo nginx -s reload

Once traffic flows smoothly, decommission the blue hosts or keep them as a fallback.

4. CI/CD Pipeline Integration

A reliable pipeline automates the steps above and prevents human error.

# .github/workflows/deploy.yml name: Deploy to Production on: push: tags: - 'v*.*.*' jobs: build-and-push: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Log in to registry uses: docker/login-action@v2 with: registry: registry.example.com username: ${{ secrets.REGISTRY_USER }} password: ${{ secrets.REGISTRY_PASS }} - name: Build & push image run: | IMAGE=registry.example.com/myapp:${{ github.ref_name }} docker build -t $IMAGE . docker push $IMAGE - name: Deploy green fleet run: | ssh devops@green-host 'docker pull $IMAGE && docker run -d --name myapp $IMAGE' - name: Run health checks run: | curl -f http://green-host:3000/health - name: Switch Nginx upstream run: | ssh devops@nginx-host 'sed -i "s/# server 10.0.2.10/server 10.0.2.10/" /etc/nginx/conf.d/myapp.conf && sudo nginx -s reload'

The workflow enforces:

Tag‑driven releases (no manual version bumps).
Automated health validation before traffic cut‑over.
Atomic Nginx reload, which is a zero‑downtime operation.

5. Observability & Logging

Even with health checks, you need real‑time insight.

Structured logs – Output JSON to stdout; Docker captures them automatically.
Metrics – Export Prometheus metrics from your app (/metrics).
Tracing – Use OpenTelemetry to propagate request IDs through Nginx (proxy_set_header X-Trace-ID $request_id).
Alerting – Set up alerts on:
- Container unhealthy status.
- Nginx 5xx spikes.
- Latency > 200 ms for the /health endpoint.

6. Rollback Strategy

Never assume a deployment will succeed.

Keep the blue pool running until the green pool has processed at least one successful request.
If any health check fails after the switch, comment out the green servers, reload Nginx, and investigate.
Optionally, use Docker’s --rollback flag with Swarm or a Helm rollback command for Kubernetes.

7. Security Hardening

Least‑privilege containers – Run as non‑root (USER appuser).
TLS termination – Offload TLS to Nginx and enforce strong ciphers.
Secret injection – Use Docker secrets or a vault; never bake keys into images.
CSP headers – Add Content‑Security‑Policy in Nginx to mitigate XSS.

add_header Content-Security-Policy "default-src 'self'; script-src 'self' https://cdn.jsdelivr.net";

8. Post‑Deployment Checklist

[ ] Verify /health returns 200 on all green nodes.
[ ] Confirm Nginx logs show zero 5xx responses.
[ ] Check Prometheus dashboards for error rate and latency.
[ ] Ensure secrets are still encrypted at rest.
[ ] Document the new image tag in the release notes.

Conclusion

Zero‑downtime deployments are less about magic and more about disciplined repeatable steps. By versioning Docker images, leveraging Nginx’s upstream switching, and wiring health checks into your CI/CD pipeline, you can ship features several times a day without ever hurting your users. Remember to keep observability front‑and‑center and always have a rollback plan ready.

If you need help shipping this, the team at https://ramerlabs.com can help.

DEV Community