Docker Best Practices Workshop

Docker Best Practices Workshop How to work effectively with Docker Ahmed AbouZaid, DevOps Engineer, Camunda 21.09.2021

2 Ahmed AbouZaid A passionate DevOps engineer, Cloud/Kubernetes specialist, Free/Open source geek, and an author. • I believe in self CI/CD (Continuous Improvements/Development) also that “The whole is greater than the sum of its parts” • DevOps transformation, automation, data, and metrics are my preferred areas • And I like to help both businesses and people to grow Find me at: tech.aabouzaid.com | linkedin.com/in/aabouzaid About September 2021, Kayaking in the Spree ✅ Do Kayaking 🚫 Don’t sit like that!

3 Content Quick Introduction Essential Practices • Use Dockerfile linter • Check Docker language specific best practices • Create a single application per Docker image • Create configurable ephemeral containers Image Practices • Understanding Docker image • Use optimal base image • Pin versions everywhere • Create image with the optimal size • Use multi-stage whenever possible • Avoid any unnecessary files Security Practices • Always use trusted images • Never use untrusted resources • Never store sensitive data in the image • Use a non-root user • Scan image vulnerabilities Misc Practices • Leverage Docker build cache • Avoid system cache • Create a unified image across envs • Use ENTRYPOINT with CMD Next steps

5 Overview In this workshop, in a hands-on approach, we will cover 18 best practices in 4 categories or in other words ✅ Dos & 🚫 Don'ts. After a general introduction, we will have a look on the essential practices (aka must do), then move to the image practices, then we will go through the security practices, and finally, some general practices. Please note, this workshop assumes that you have a basic knowledge of Docker. Timeline • 30 min: Review the best practices • 10 min: Questions • 10 min: Break • 20 min: Apply the best practices • 20 min: Discussion

7 Containers, Docker, and Kubernetes Containers Technology for packaging an application along with its runtime dependencies Docker Docker is the de facto standard to build and share containerized apps Kubernetes A cloud-native platform to manage and orchestrate containers workloads Image: o_m/Shutterstock

8 Dockerfile, Docker Image, and Docker Container Dockerfile A text file contains a set of instructions that is used to build a Docker image Docker Image A combination of layered filesystems stacked on top of each other to create a customizable usable image Docker Container A runtime instance of a Docker image

10 • First things first, use a Dockerfile linter! Use hadolint! • It will help you to apply best practice by default • By using hadolint, you will avoid at least 50% of the Docker issues • Use it via CLI or integrate it with IDE, e.g. VS Code hadolint extension 1.1 Use Dockerfile linter

11 • There are Docker general best practices that work for all languages • Usually each language group (e.g., interpreted, native, JVM) has common best practices • Some languages have their own best practices • Check if the language that you use has language specific best practices 1.2 Check Docker language-specific best practices

12 • A Docker image with a single application is more: • Maintainable • Scalable • Secure • Reusable • Portable • Multiple processes within container usually a nightmare in development as well as in operations 1.3 Create a single application per Docker image Image: Docker.com - What is a Container?

13 • “An ephemeral container can be stopped and destroyed, then rebuilt and replaced with an absolute minimum set up and configuration” • Avoid dynamic configuration at runtime whenever possible • Set configuration defaults but don’t store env related configuration • Follow “The Twelve-Factor App” methodology as much as possible 1.4 Create configurable ephemeral containers

15 • Docker image is made of layers • Docker image layers are immutable (Read-only) • Each instruction in Dockerfile is a layer in Docker image • The previous layers cannot be changed by next instructions • Removing files from previous layer just hide them but they are still there Understanding Docker image Only “ADD”, “COPY”, “RUN” can create filesystem layers (which increase image size) ℹ Note

16 • Use official images or from well-known identities • Use the smallest base image that fits your use case • Avoid using generic images when good language specific images are available 2.1 Use optimal base image ✅ Do 🚫 Don’t FROM python:3.8.10-alpine3.14 FROM alpine:3.14 RUN apk add 'python3=3.8.10-r0'

17 • Never use base image without a tag or with ‘latest’ tag • Avoid pinning to major version • In most cases pinning minor version should be fine • Pin up to patch version for critical components • Also pin the version of the dependances 2.2 Pin versions everywhere ✅ Do 🚫 Don’t FROM python:3.8 RUN pip install Flask==2.0.0 FROM python RUN pip install Flask

18 • As a rule of thumb, smaller Docker images are better • However, be aware of: • Too small base image means increase in the build time (CI) • Too big base image means increase in the deploy time (CD) • Try to find the sweet spot to balance between build and deploy time according to your needs and use cases 2.3 Create image with the optimal size ✅ Do (or not) 🚫 Don’t FROM node:14.17.6-alpine3.14 RUN apk add --no-cache curl FROM alpine:3.14 RUN apk add --no-cache 'nodejs=14.17.6-r0' curl Build time: 2s (3 builds avg, no layers cache) Image size: 120MB Build time: 6s (3 builds avg, no layers cache) Image size: 46.3MB

19 • Multi-stage feature allows you to build smaller and cleaner images by splitting the build image from the runtime image • It’s super useful for languages that create artifacts like Golang, Java, etc. • Also it’s helpful to run various tests during the development • Additionally, it’s better for security because it reduces the attack surface 2.4 Use multi-stage whenever possible ✅ Do # Build stage. FROM maven:3.6-openjdk-17 AS builder [...] RUN mvn clean package # Runtime stage. FROM openjdk:17-jdk-alpine3.14 COPY --from=builder /myapp.jar /opt/ ENTRYPOINT ["java", "-jar", "/opt/myapp.jar"]

20 • Every extra file could increase build time, image size, or even both! • Specify the files and paths that need to be part of the image • Use “.dockerignore” to filter any unnecessary files • If necessary, restructure your repo/code to have only needed files in seperate folders 2.5 Avoid any unnecessary files ✅ Do 🚫 Don’t FROM python # Only needed files are added to the image COPY myapp.py /opt ENTRYPOINT ["python", "/opt/myapp.py"] FROM python # The whole repo/context is added to the image COPY . /opt ENTRYPOINT ["python", "/opt/myapp.py"]

22 • Use image from trusted repositories • Use official images whenever possible • If no official image, use only images from well-known identities • For critical components, don’t use public Docker repositories • Sign your images with Docker Content Trust (DCT) 3.1 Always use trusted images ✅ Do 🚫 Don’t FROM openjdk:12 FROM coolestGuyInTheTown/openjdk:12

23 • Using a trusted image doesn’t help if untrusted resources are used in the image itself • Always use resources from trusted sources • When a Git resource is used, always use Git hash because Git tags are mutable • In general, try to minimize number of external resources used in the image ✅ Do 🚫 Don’t FROM alpine # You know what you get exactly ARG HELPER_SCRIPT_URL= https://raw.githubusercontent.com/trusted-user/ awesome-scripts/5330224/some-helper-script.sh # Or better: COPY scripts/some-helper-script.sh /tmp FROM alpine # The resource could be changed anytime! ARG HELPER_SCRIPT_URL= https://raw.githubusercontent.com/random-user/ awesome-scripts/master/some-helper-script.sh 3.2 Never use untrusted resources

24 • Any data saved in one of the layers cannot be removed in the next layer! It will be only hidden and could be easily retrieved • For runtime secrets, use env vars to access the sensitive data • For build time secrets, use Docker BuildKit which allows to access sensitive data securely during the build time (never use ARG for build time secrets) 3.3 Never store sensitive data in the image ✅ Do 🚫 Don’t RUN --mount=type=secret,id=GITHUB_NPM_TOKEN npm set //npm.pkg.github.com/:_authToken $GITHUB_NPM_TOKEN && npm install # This file will be stored in the image COPY .npmrc . RUN npm install && rm .npmrc # Also build args will be stored in the image ARG GITHUB_NPM_TOKEN RUN npm set //npm.pkg.github.com/:_authToken $GITHUB_NPM_TOKEN && npm install $ export GITHUB_NPM_TOKEN=top_secret $ export DOCKER_BUILDKIT=1 $ docker build --secret id=GITHUB_NPM_TOKEN .

25 • By default, Docker will use “root” to execute the container commands • Using root user is a bad practice and considered a security risk • Always (or whenever possible) set “USER” instruction to a non-root user • Remember that the user must already exist in the Docker image system to be used with the “USER” instruction 3.4 Use a non-root user ✅ Do 🚫 Don’t FROM alpine USER nobody CMD ["whoami"] FROM alpine # The root user will be used to execute commands CMD ["whoami"] Output: nobody Output: root

26 • Docker images vulnerability scanning tools mainly aim to detect exploits in the image libraries • There are many solutions and tools like Trivy, Snyk, and even integrated with cloud like GCR (Google Container Registry) • Scan your images during development as well as in production • Depends on your use case, scan your images with every build or at least daily 3.5 Scan image vulnerabilities

28 • As mentioned before, Docker image consists of a stack of immutable layers • Each instruction of the Dockerfile is an independent layer • When a layer is generated it’s cached locally to be reused again • However, if there is a change in one layer, its cache is invalidated together with all next layers 4.1 Leverage Docker build cache

29 • In Dockerfile, put less frequently changing instructions at the top of the file and more likely changing instructions at the end of the file • Docker build cache is super helpful in the local development as well as in CI/CD (when the build is done on a single machine or with distributed caching layer) 4.1 Leverage Docker build cache (continued) ✅ Do 🚫 Don’t FROM alpine # The ENV and RUN layers will be reused # even when the source code changed ENV LOG_LEVEL=info RUN apk add python3 COPY myapp.py /opt FROM alpine # Any change in the source code will invalidate # the cache of all next layers COPY myapp.py /opt RUN apk add python3 ENV LOG_LEVEL=info

30 4.2 Avoid system cache • Systems use caching to speed up things that used frequently • Each system is caching different things, for example package manager metadata • In Docker images build, system caches usually don’t add any value since containers are immutable and each command run in a single layer • As a rule of thumb, avoid system caches because they increase image size • Remember that each system has different options to disable caches ✅ Do 🚫 Don’t FROM alpine RUN apk add --no-cache curl FROM alpine RUN apk add curl

31 • In general, try to build your image the same way for all envs (e.g., dev, stage, and prod) • Try to make your image env-agnostic so it works seamlessly across envs • Utilize multi-stage whenever possible and use “prod” as a base for other envs • For the advanced/complex use cases, use Docker BuildKit which gives you more control over builds ✅ Do FROM alpine As base RUN apk add curl FROM base As prod RUN apk add python3 FROM prod As dev RUN apk add python3-dev # Build dev image (build the whole file) $ docker build -t myapp:dev . # Build prod image (stop at the prod stage) $ docker build --target prod -t myapp:v1 . 4.3 Create a unified image across envs

32 • Both “ENTRYPOINT” and “CMD” are Dockerfile instructions which used to control the default command within the Docker image • Either of “ENTRYPOINT” and “CMD” could be used independently • However, using both of them at the same time makes things easier to customize containers behaviour, especially in Kubernetes • As a rule of thumb, if your application customizable via arguments use “ENTRYPOINT” for the main command and “CMD” for default arguments 4.4 Use ENTRYPOINT with CMD ✅ Do FROM alpine ENTRYPOINT ["echo"] CMD ["-e", "HellonWorld"]

34 • Find the last Docker image you have created and refactor it according to the best practices in this workshop • Integrate hadolint (Dockerfiles linter) with your local IDE and your team CI pipeline • Find out some interesting Docker scenarios on Katakoda and get hands-on • Advanced topics: • Sign your Docker images with Docker Content Trust (DCT) • Take a look on BuildKit which is a Dockerfile-agnostic builder toolkit More details: Faster Builds and Smaller Images Using BuildKit • Do you know that Docker is not only the container management system? Read more about Docker Alternative Container Tools Next steps

36 References • Intro Guide to Dockerfile Best Practices - Docker Blog • Best practices for writing Dockerfiles - Docker Documentation • Image-building best practices - Docker Documentation • Best practices for building containers - Google Cloud Architecture Center • Top 20 Dockerfile best practices for security - Sysdig • On Docker Articles - vsupalov.com

37 What is your best practice? Questions? :-)

Docker Best Practices Workshop

In this document

More Related Content

What's hot

Similar to Docker Best Practices Workshop

More from Ahmed AbouZaid

Recently uploaded

Docker Best Practices Workshop