Skip to content
ioob.dev
Go back

Docker for Beginners Part 3 — Writing a Dockerfile

· 8 min read
Docker Series (3/13)
  1. Docker for Beginners Part 1 — What Is Docker
  2. Docker for Beginners Part 2 — Images and Layers
  3. Docker for Beginners Part 3 — Writing a Dockerfile
  4. Docker for Beginners Part 4 — Container Lifecycle
  5. Docker for Beginners Part 5 — Volumes and Data Persistence
  6. Docker for Beginners Part 6 — Networking
  7. Docker Part 7 — Multi-Container Orchestration with Docker Compose
  8. Docker Part 8 — Slimming Images with Multi-Stage Builds
  9. Docker Part 9 — Registry: Where Do Images Live?
  10. Docker Part 10 — Container Security: Blocking Issues Before They Blow Up
  11. Docker Part 11 — BuildKit and Advanced Builds
  12. Docker Part 12 — Production Best Practices
  13. Docker Part 13 — Troubleshooting and Alternatives
Table of contents

Table of contents

A Dockerfile Is Not a Recipe — It Is a History

Now that you understand images and layers, it is time to build an image yourself. The tool for that is the Dockerfile. On the surface it is just a plain text file, but in reality it is a document that records the history of commands: “stack this layer on top of that layer, and then stack another layer on top.”

flowchart TB
    DF["Dockerfile<br/>text file"] --> BUILD["docker build"]
    BUILD --> CTX["Build Context<br/>(current directory + .dockerignore)"]
    CTX --> ENGINE["BuildKit / legacy builder"]

    subgraph LAYERS["Generated layers"]
        L1["Base from FROM"]
        L2["RUN ... result"]
        L3["COPY ... result"]
        L4["COPY src . result"]
    end

    ENGINE --> LAYERS
    LAYERS --> IMG["Finished image"]

In this part, we will examine the commonly used instructions one by one. Not all of them appear in every Dockerfile, but the moment you understand what each does and how they differ, you start writing better Dockerfiles.

The Simplest Dockerfile

Before the explanation, let’s start with the end result we want. Here is a realistic example of containerizing a Node.js app:

# syntax=docker/dockerfile:1.7
FROM node:22-slim AS runtime

WORKDIR /app

# Dependency layer — cached if package.json is unchanged
COPY package*.json ./
RUN npm ci --omit=dev

# App code layer — changes frequently
COPY . .

ENV NODE_ENV=production \
    PORT=3000

EXPOSE 3000

USER node
CMD ["node", "server.js"]

In roughly 12–15 lines, nearly all the instructions we will discuss in this part are present. Let’s break them down one by one.

FROM — Everything Starts Here

FROM specifies the base image. The first line of a Dockerfile (except ARG) must always be FROM. Our layers are stacked on top of the image chosen here.

FROM node:22-slim

Base image selection has the greatest impact on the final image size, security, and stability. As discussed in Part 2, you should be deliberate about whether to choose slim, alpine, or distroless.

In multi-stage builds, you can use multiple FROM statements, naming each stage with AS:

FROM golang:1.23 AS builder
# ... build steps ...

FROM gcr.io/distroless/static-debian12 AS runtime
COPY --from=builder /out/app /app

--from=builder is used to copy files from a previous stage. Build tools do not remain in the runtime image.

RUN — A New Layer Is Created

RUN executes commands during the image build. Each RUN creates a new layer. Therefore, “how many RUN statements you use” directly impacts image size and build time.

There are two forms:

# shell form — executed via /bin/sh -c
RUN apt-get update && apt-get install -y curl

# exec form — executed directly without a shell
RUN ["apt-get", "install", "-y", "curl"]

shell form is convenient because you can use shell features (&&, |, environment variable expansion, etc.). exec form does not go through a shell, so signal handling is more precise and it works even in distroless images that lack a shell.

As shown in Part 2, it is important to combine apt-get install and rm -rf /var/lib/apt/lists/* in the same RUN. If file creation and cleanup end up in different layers, deleted files remain in the lower layer and unnecessarily bloat the image.

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
      curl \
      ca-certificates \
 && rm -rf /var/lib/apt/lists/*

COPY vs ADD — Just Use COPY

Both are instructions for copying files into the image, but they have subtle differences:

ADD’s extra features come with “sneaky side effects.” URL downloads have poor reproducibility and are hard to security-audit. Automatic tar extraction can unpack files unintentionally.

Rule: Use COPY by default, and only use ADD when you specifically need to extract a tar file. If you need to download from a URL, it is better to use RUN curl or RUN wget explicitly.

# Typical usage
COPY package.json /app/

# Specifying owner/permissions is also possible
COPY --chown=node:node . /app/

--chown sets the owner of copied files. This is useful when a specific user inside the container needs to read those files later.

WORKDIR — Declare the Working Directory

WORKDIR /app makes all subsequent commands execute in /app. Without it, the default is /, and files tend to scatter everywhere.

WORKDIR /app
COPY . .            # Copies to /app
RUN npm install     # Runs in /app
CMD ["node", "server.js"]  # Runs in /app

WORKDIR can be used multiple times and accepts relative paths. If the directory does not exist, it is automatically created. This is cleaner than manually writing mkdir && cd.

CMD and ENTRYPOINT — The Confusing Duo

These instructions define what to run when the container starts. They look similar but have different roles:

Behavior changes depending on the combination:

# Pattern 1: ENTRYPOINT + CMD — most recommended
ENTRYPOINT ["node"]
CMD ["server.js"]
# docker run myapp       -> node server.js
# docker run myapp app.js -> node app.js  (CMD is replaced)
# Pattern 2: CMD only
CMD ["node", "server.js"]
# docker run myapp        -> node server.js
# docker run myapp bash   -> bash (entire CMD is replaced)
# Pattern 3: ENTRYPOINT only — arguments are hardcoded
ENTRYPOINT ["node", "server.js"]
# docker run myapp some-arg -> node server.js some-arg (argument is appended)

Practical recommendation: ENTRYPOINT + CMD. Put the binary to run in ENTRYPOINT and the default arguments in CMD. This way, users can override just the arguments with docker run.

The difference between exec form (JSON array) and shell form (string) is also important. Always prefer exec form.

# Good — exec form
CMD ["node", "server.js"]

# Bad — shell form
CMD node server.js

Shell form is internally executed as /bin/sh -c "node server.js". This means PID 1 becomes sh, and sh does not properly forward signals to its children. Even if you run docker stop, the app does not receive SIGTERM. This issue is covered in detail in Part 4.

ENV — Embedding Environment Variables

ENV embeds environment variables into the image. The important point is that they are valid not only at build time but also at container runtime.

ENV NODE_ENV=production \
    PORT=3000 \
    LOG_LEVEL=info

Using \ for line continuation lets you bundle multiple variables into a single layer. Inside the container, these values are visible as defaults.

docker run --rm myapp printenv NODE_ENV
# production

docker run --rm -e NODE_ENV=development myapp printenv NODE_ENV
# development  <- Can be overridden at runtime

Do not embed sensitive values (passwords, API keys) with ENV. They are included as-is in the image layers and anyone can see them with docker history.

ARG vs ENV — They Look the Same but Are Different

Both are “variables,” but their scopes differ:

PropertyARGENV
When validDuring build onlyDuring build + container runtime
Accessible at docker run?NoYes
Inject value at docker build?--build-argCannot
Stored in image?NoYes

Use ARG for values that should only be used at build time and not persist in the container. Use ENV for values you want as default environment variables in the container.

# Version info used only at build time
ARG APP_VERSION=dev
RUN echo "Building version ${APP_VERSION}"

# Environment variable that persists in the container
ENV NODE_ENV=production
docker build --build-arg APP_VERSION=1.2.3 -t myapp .

One caveat: if you assign an ARG to an ENV, that value stays in the image.

ARG DB_PASSWORD
ENV DB_PASSWORD=${DB_PASSWORD}  # Embedded in image — risk of password leak

To use a sensitive value only during build time and keep it out of the image, you need BuildKit’s --secret:

# syntax=docker/dockerfile:1.7
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc npm ci
docker build --secret id=npmrc,src=$HOME/.npmrc -t myapp .

This way, .npmrc is mounted only during the build and does not remain in the final image.

EXPOSE — More of a Documentation Declaration

EXPOSE 3000 merely declares “this container will use port 3000.” It does not automatically open the port. To access it from outside, you need to explicitly forward it with something like docker run -p 3000:3000.

So why use it? Two purposes:

  1. Documentation so that someone looking at the Dockerfile alone can tell “this image uses this port”
  2. When you use docker run -P (capital P), EXPOSEd ports are automatically mapped to random host ports

Networking is covered in depth in Part 6. For now, just remember that EXPOSE does not actually open a port.

USER — The User Inside the Container

By default, containers run as root. While convenient, it is bad for security. If a container escape vulnerability occurs, the attack can continue with root privileges on the host.

Use USER to specify a non-root user:

FROM node:22-slim
WORKDIR /app
COPY --chown=node:node . .
RUN npm ci --omit=dev
USER node
CMD ["node", "server.js"]

The node image comes with a default node user at UID 1000, so you can use it directly. For other images, you need to create one manually:

RUN groupadd -r app && useradd -r -g app -u 1001 app
USER app

.dockerignore — What to Exclude from the Build Context

When you run docker build ., the entire current directory is sent to the Docker Daemon as the build context. If files like node_modules, .git, and .env are all included, the transfer is slow and there is a risk of them accidentally ending up in the image.

List what you want to exclude in .dockerignore, and they will be excluded from the build context. The syntax is the same as .gitignore.

# Version control
.git
.gitignore

# Dependencies (installed fresh inside the container)
node_modules
npm-debug.log

# Local environment
.env
.env.*
!.env.example

# Editor/OS
.vscode
.idea
.DS_Store

# Build artifacts
dist
build
coverage

!.env.example means “do not exclude this.” Use it when you want to include an example file in the image.

Setting up .dockerignore properly makes build speeds noticeably faster and improves cache hit rates. If the .git directory is included in the build context, every commit changes the context and can invalidate the cache.

HEALTHCHECK — Declaring Container Health Status

Just because a container is running does not guarantee the app inside is working properly. The process might be up, but it could be failing to respond because the DB connection dropped. HEALTHCHECK periodically runs a check command to determine the container’s health status.

HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

If the check fails, the container status shows as unhealthy. Kubernetes or Docker Compose can use this information to take actions like automatic restarts.

Layer Optimization at a Glance

Finally, let’s review the entire Part 3 from a layer perspective at once:

flowchart TB
    A["1. FROM lightweight base<br/>(slim / alpine / distroless)"] --> B
    B["2. Declare WORKDIR"] --> C
    C["3. COPY dependency files first<br/>(package.json, go.mod, etc.)"] --> D
    D["4. RUN dependency install<br/>(cacheable)"] --> E
    E["5. COPY remaining source<br/>(changes frequently)"] --> F
    F["6. ENV / EXPOSE / HEALTHCHECK"] --> G
    G["7. USER non-root"] --> H
    H["8. ENTRYPOINT + CMD (exec form)"]

If you write your Dockerfile in this order, caching works well, the image stays lightweight, and security improves. There is no such thing as a perfect Dockerfile, but as long as you do not deviate far from this flow, you will avoid major pitfalls.

Quick-Reference Checklist

Items to verify before submitting your Dockerfile in a PR:


In the next part, we move on to the story of running, stopping, and restarting a completed image. Key options for docker run, the --restart policy, building containers that shut down cleanly with SIGTERM, and how to check their status.

Part 4: Container Lifecycle


Related Posts

Share this post on:

Comments

Loading comments...


Previous Post
Docker for Beginners Part 2 — Images and Layers
Next Post
Docker for Beginners Part 4 — Container Lifecycle