Posted on Sep 25

The Ultimate Checklist for Zero‑Downtime Deploys with Docker & Nginx

Introduction

Zero‑downtime deployments are no longer a nice‑to‑have; they’re a baseline expectation for modern services. As a DevOps lead, you’re probably juggling Docker containers, Nginx reverse‑proxy configurations, and a CI/CD pipeline that must stay green even when you push new code. This checklist walks you through the practical steps to achieve seamless rollouts without sacrificing observability or security.

1. Prepare Your Docker Images

a. Immutable Base Images

Use a minimal, version‑pinned base (e.g., python:3.11-slim or node:20-alpine).
Run docker history <image> to verify no stray layers.

b. Multi‑Stage Builds

# Stage 1 – Build FROM node:20-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci && npm run build # Stage 2 – Runtime FROM nginx:alpine COPY --from=builder /app/dist /usr/share/nginx/html EXPOSE 80

Keeps the final image lean, reduces attack surface, and speeds up pull times.

c. Tagging Strategy

Semantic version tags: myapp:1.4.2.
latest points to the most recent stable release only.
Store a git SHA label for traceability: LABEL commit="$(git rev-parse --short HEAD)".

2. Nginx as a Smart Router

a. Upstream Blocks for Blue‑Green

upstream myapp_blue { server 10.0.1.10:80; server 10.0.1.11:80; } upstream myapp_green { server 10.0.2.10:80; server 10.0.2.11:80; } map $http_x_deploy_stage $upstream { default myapp_blue; "green" myapp_green; } server { listen 80; location / { proxy_pass http://$upstream; proxy_set_header Host $host; } }

The $http_x_deploy_stage header lets you toggle traffic with a single curl command.

b. Health Checks

location /health { proxy_pass http://myapp_blue/health; proxy_next_upstream error timeout invalid_header http_500; proxy_connect_timeout 2s; proxy_read_timeout 2s; }

Nginx will automatically stop sending traffic to unhealthy containers.

3. CI/CD Pipeline Guardrails

Stage	Tool	Key Settings
Build	GitHub Actions / GitLab CI	Cache `node_modules` or `pip` wheels, fail on lint errors
Test	Jest / PyTest	Run in parallel containers, enforce ≥80% coverage
Publish	Docker Hub / ECR	Use `docker push $IMAGE:$TAG`, sign images with Notary
Deploy	Argo CD / Spinnaker	Deploy to blue first, run smoke tests, then switch traffic

a. Automated Smoke Tests

- name: Smoke test blue run: | curl -sSf http://myapp.example.com/health || exit 1

If the smoke test fails, abort the traffic switch.

b. Rollback Automation

Store the previous image tag in a KV store (e.g., Consul).
A simple rollback script:

PREV=$(consul kv get myapp/prev_tag) docker pull myrepo/myapp:$PREV docker tag myrepo/myapp:$PREV myrepo/myapp:current # Trigger deployment to blue again curl -X POST -H "X-Deploy-Stage: blue" https://ci.example.com/deploy

4. Blue‑Green Switch Procedure

Deploy to Green – Push the new image, update the green upstream, and run smoke tests.
Validate – Verify logs, metrics, and end‑to‑end flows in a staging sub‑domain.
Flip Traffic – Add the header X-Deploy-Stage: green to all inbound requests (or change the Nginx map default).
Monitor – Keep an eye on error rates, latency, and resource usage for at least 5 minutes.
Decommission Blue – Drain connections, stop containers, and optionally delete the old image.

Quick CLI Switch

# Switch all traffic to green curl -X POST -H "X-Deploy-Stage: green" https://myapp.example.com/__internal__/toggle

The internal endpoint updates the Nginx map without a full reload.

5. Observability & Logging

a. Centralized Logs

Ship Docker stdout/stderr to Loki or Elastic via Fluent Bit.
Include the commit label in every log line for easy correlation.

b. Metrics

Export Prometheus metrics from Nginx (nginx-prometheus-exporter).
Track http_requests_total, http_request_duration_seconds, and nginx_upstream_response_time.

c. Alerts

Alert on a spike > 5 % in 5xx responses after a traffic switch.
Use PagerDuty or Opsgenie for on‑call escalation.

6. Security Checklist

Image Scanning – Run Trivy or Clair on every build; fail on CVE > 7.
Least‑Privilege Containers – Drop CAP_NET_RAW, run as non‑root (USER 1001).
TLS Termination – Let Nginx handle TLS; enforce TLSv1.3 and strong ciphers.
Header Hardening – Add Content‑Security‑Policy, X‑Content‑Type‑Options, Strict‑Transport‑Security.

7. Post‑Deployment Hygiene

Database Migrations – Run them before the green rollout, using a zero‑downtime strategy (add columns, backfill, then switch queries).
Cache Invalidation – If you use Redis, version keys (v2:user:123) to avoid stale reads.
Documentation – Keep a deploy.md in the repo that records the exact steps and rollback plan.

Conclusion

Achieving zero‑downtime deployments with Docker and Nginx is a disciplined process: immutable images, smart Nginx routing, guarded CI/CD pipelines, thorough observability, and a solid rollback plan. Follow this checklist for each release, and you’ll reduce risk while keeping your users blissfully unaware of any underlying changes.

If you need help shipping this, the team at https://ramerlabs.com can help.

DEV Community