Docker for Beginners Part 3 — Writing a Dockerfile

A Dockerfile Is Not a Recipe — It Is a History
The Simplest Dockerfile
FROM — Everything Starts Here
RUN — A New Layer Is Created
COPY vs ADD — Just Use COPY
WORKDIR — Declare the Working Directory
CMD and ENTRYPOINT — The Confusing Duo
ENV — Embedding Environment Variables
ARG vs ENV — They Look the Same but Are Different
EXPOSE — More of a Documentation Declaration
USER — The User Inside the Container
.dockerignore — What to Exclude from the Build Context
HEALTHCHECK — Declaring Container Health Status
Layer Optimization at a Glance
Quick-Reference Checklist

A Dockerfile Is Not a Recipe — It Is a History

Now that you understand images and layers, it is time to build an image yourself. The tool for that is the Dockerfile. On the surface it is just a plain text file, but in reality it is a document that records the history of commands: “stack this layer on top of that layer, and then stack another layer on top.”

flowchart TB
    DF["Dockerfile<br/>text file"] --> BUILD["docker build"]
    BUILD --> CTX["Build Context<br/>(current directory + .dockerignore)"]
    CTX --> ENGINE["BuildKit / legacy builder"]

    subgraph LAYERS["Generated layers"]
        L1["Base from FROM"]
        L2["RUN ... result"]
        L3["COPY ... result"]
        L4["COPY src . result"]
    end

    ENGINE --> LAYERS
    LAYERS --> IMG["Finished image"]

In this part, we will examine the commonly used instructions one by one. Not all of them appear in every Dockerfile, but the moment you understand what each does and how they differ, you start writing better Dockerfiles.

The Simplest Dockerfile

Before the explanation, let’s start with the end result we want. Here is a realistic example of containerizing a Node.js app:

# syntax=docker/dockerfile:1.7
FROM node:22-slim AS runtime

WORKDIR /app

# Dependency layer — cached if package.json is unchanged
COPY package*.json ./
RUN npm ci --omit=dev

# App code layer — changes frequently
COPY . .

ENV NODE_ENV=production \
    PORT=3000

EXPOSE 3000

USER node
CMD ["node", "server.js"]

In roughly 12–15 lines, nearly all the instructions we will discuss in this part are present. Let’s break them down one by one.

FROM — Everything Starts Here

FROM specifies the base image. The first line of a Dockerfile (except ARG) must always be FROM. Our layers are stacked on top of the image chosen here.

FROM node:22-slim

Base image selection has the greatest impact on the final image size, security, and stability. As discussed in Part 2, you should be deliberate about whether to choose slim, alpine, or distroless.

In multi-stage builds, you can use multiple FROM statements, naming each stage with AS:

FROM golang:1.23 AS builder
# ... build steps ...

FROM gcr.io/distroless/static-debian12 AS runtime
COPY --from=builder /out/app /app

--from=builder is used to copy files from a previous stage. Build tools do not remain in the runtime image.

RUN — A New Layer Is Created

RUN executes commands during the image build. Each RUN creates a new layer. Therefore, “how many RUN statements you use” directly impacts image size and build time.

There are two forms:

# shell form — executed via /bin/sh -c
RUN apt-get update && apt-get install -y curl

# exec form — executed directly without a shell
RUN ["apt-get", "install", "-y", "curl"]

shell form is convenient because you can use shell features (&&, |, environment variable expansion, etc.). exec form does not go through a shell, so signal handling is more precise and it works even in distroless images that lack a shell.

As shown in Part 2, it is important to combine apt-get install and rm -rf /var/lib/apt/lists/* in the same RUN. If file creation and cleanup end up in different layers, deleted files remain in the lower layer and unnecessarily bloat the image.

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
      curl \
      ca-certificates \
 && rm -rf /var/lib/apt/lists/*

COPY vs ADD — Just Use COPY

Both are instructions for copying files into the image, but they have subtle differences:

COPY: Simply copies files/directories from the host into the image
ADD: Includes all COPY functionality plus (1) downloading files from URLs and (2) automatic extraction of compressed files like tar.gz

ADD’s extra features come with “sneaky side effects.” URL downloads have poor reproducibility and are hard to security-audit. Automatic tar extraction can unpack files unintentionally.

Rule: Use COPY by default, and only use ADD when you specifically need to extract a tar file. If you need to download from a URL, it is better to use RUN curl or RUN wget explicitly.

# Typical usage
COPY package.json /app/

# Specifying owner/permissions is also possible
COPY --chown=node:node . /app/

--chown sets the owner of copied files. This is useful when a specific user inside the container needs to read those files later.

WORKDIR — Declare the Working Directory

WORKDIR /app makes all subsequent commands execute in /app. Without it, the default is /, and files tend to scatter everywhere.

WORKDIR /app
COPY . .            # Copies to /app
RUN npm install     # Runs in /app
CMD ["node", "server.js"]  # Runs in /app

WORKDIR can be used multiple times and accepts relative paths. If the directory does not exist, it is automatically created. This is cleaner than manually writing mkdir && cd.

CMD and ENTRYPOINT — The Confusing Duo

These instructions define what to run when the container starts. They look similar but have different roles:

ENTRYPOINT: The “fixed” command that always runs
CMD: The “default arguments” passed to ENTRYPOINT

Behavior changes depending on the combination:

# Pattern 1: ENTRYPOINT + CMD — most recommended
ENTRYPOINT ["node"]
CMD ["server.js"]
# docker run myapp       -> node server.js
# docker run myapp app.js -> node app.js  (CMD is replaced)

# Pattern 2: CMD only
CMD ["node", "server.js"]
# docker run myapp        -> node server.js
# docker run myapp bash   -> bash (entire CMD is replaced)

# Pattern 3: ENTRYPOINT only — arguments are hardcoded
ENTRYPOINT ["node", "server.js"]
# docker run myapp some-arg -> node server.js some-arg (argument is appended)

Practical recommendation: ENTRYPOINT + CMD. Put the binary to run in ENTRYPOINT and the default arguments in CMD. This way, users can override just the arguments with docker run.

The difference between exec form (JSON array) and shell form (string) is also important. Always prefer exec form.

# Good — exec form
CMD ["node", "server.js"]

# Bad — shell form
CMD node server.js

Shell form is internally executed as /bin/sh -c "node server.js". This means PID 1 becomes sh, and sh does not properly forward signals to its children. Even if you run docker stop, the app does not receive SIGTERM. This issue is covered in detail in Part 4.

ENV — Embedding Environment Variables

ENV embeds environment variables into the image. The important point is that they are valid not only at build time but also at container runtime.

ENV NODE_ENV=production \
    PORT=3000 \
    LOG_LEVEL=info

Using \ for line continuation lets you bundle multiple variables into a single layer. Inside the container, these values are visible as defaults.

docker run --rm myapp printenv NODE_ENV
# production

docker run --rm -e NODE_ENV=development myapp printenv NODE_ENV
# development  <- Can be overridden at runtime

Do not embed sensitive values (passwords, API keys) with ENV. They are included as-is in the image layers and anyone can see them with docker history.

ARG vs ENV — They Look the Same but Are Different

Both are “variables,” but their scopes differ:

Property	ARG	ENV
When valid	During build only	During build + container runtime
Accessible at `docker run`?	No	Yes
Inject value at `docker build`?	`--build-arg`	Cannot
Stored in image?	No	Yes

Use ARG for values that should only be used at build time and not persist in the container. Use ENV for values you want as default environment variables in the container.

# Version info used only at build time
ARG APP_VERSION=dev
RUN echo "Building version ${APP_VERSION}"

# Environment variable that persists in the container
ENV NODE_ENV=production

docker build --build-arg APP_VERSION=1.2.3 -t myapp .

One caveat: if you assign an ARG to an ENV, that value stays in the image.

ARG DB_PASSWORD
ENV DB_PASSWORD=${DB_PASSWORD}  # Embedded in image — risk of password leak

To use a sensitive value only during build time and keep it out of the image, you need BuildKit’s --secret:

# syntax=docker/dockerfile:1.7
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc npm ci

docker build --secret id=npmrc,src=$HOME/.npmrc -t myapp .

This way, .npmrc is mounted only during the build and does not remain in the final image.

EXPOSE — More of a Documentation Declaration

EXPOSE 3000 merely declares “this container will use port 3000.” It does not automatically open the port. To access it from outside, you need to explicitly forward it with something like docker run -p 3000:3000.

So why use it? Two purposes:

Documentation so that someone looking at the Dockerfile alone can tell “this image uses this port”
When you use docker run -P (capital P), EXPOSEd ports are automatically mapped to random host ports

Networking is covered in depth in Part 6. For now, just remember that EXPOSE does not actually open a port.

USER — The User Inside the Container

By default, containers run as root. While convenient, it is bad for security. If a container escape vulnerability occurs, the attack can continue with root privileges on the host.

Use USER to specify a non-root user:

FROM node:22-slim
WORKDIR /app
COPY --chown=node:node . .
RUN npm ci --omit=dev
USER node
CMD ["node", "server.js"]

The node image comes with a default node user at UID 1000, so you can use it directly. For other images, you need to create one manually:

RUN groupadd -r app && useradd -r -g app -u 1001 app
USER app

.dockerignore — What to Exclude from the Build Context

When you run docker build ., the entire current directory is sent to the Docker Daemon as the build context. If files like node_modules, .git, and .env are all included, the transfer is slow and there is a risk of them accidentally ending up in the image.

List what you want to exclude in .dockerignore, and they will be excluded from the build context. The syntax is the same as .gitignore.

# Version control
.git
.gitignore

# Dependencies (installed fresh inside the container)
node_modules
npm-debug.log

# Local environment
.env
.env.*
!.env.example

# Editor/OS
.vscode
.idea
.DS_Store

# Build artifacts
dist
build
coverage

!.env.example means “do not exclude this.” Use it when you want to include an example file in the image.

Setting up .dockerignore properly makes build speeds noticeably faster and improves cache hit rates. If the .git directory is included in the build context, every commit changes the context and can invalidate the cache.

HEALTHCHECK — Declaring Container Health Status

Just because a container is running does not guarantee the app inside is working properly. The process might be up, but it could be failing to respond because the DB connection dropped. HEALTHCHECK periodically runs a check command to determine the container’s health status.

HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

interval: Check frequency
timeout: Timeout for the check command
retries: How many failures before marking as unhealthy

If the check fails, the container status shows as unhealthy. Kubernetes or Docker Compose can use this information to take actions like automatic restarts.

Layer Optimization at a Glance

Finally, let’s review the entire Part 3 from a layer perspective at once:

flowchart TB
    A["1. FROM lightweight base<br/>(slim / alpine / distroless)"] --> B
    B["2. Declare WORKDIR"] --> C
    C["3. COPY dependency files first<br/>(package.json, go.mod, etc.)"] --> D
    D["4. RUN dependency install<br/>(cacheable)"] --> E
    E["5. COPY remaining source<br/>(changes frequently)"] --> F
    F["6. ENV / EXPOSE / HEALTHCHECK"] --> G
    G["7. USER non-root"] --> H
    H["8. ENTRYPOINT + CMD (exec form)"]

If you write your Dockerfile in this order, caching works well, the image stays lightweight, and security improves. There is no such thing as a perfect Dockerfile, but as long as you do not deviate far from this flow, you will avoid major pitfalls.

Quick-Reference Checklist

Items to verify before submitting your Dockerfile in a PR:

Did you choose the smallest base image? Did you pin a specific version instead of :latest?
Did you clean up the cache in the same RUN after apt-get/apk installs?
Is the dependency install layer above the source copy?
Does .dockerignore exclude node_modules, .git, and .env?
Are ENTRYPOINT and CMD in exec form (JSON array)?
Are sensitive values not hardcoded in ENV?
Is it running as a non-root USER?
Is there a HEALTHCHECK? (Not required, but makes operations easier)

In the next part, we move on to the story of running, stopping, and restarting a completed image. Key options for docker run, the --restart policy, building containers that shut down cleanly with SIGTERM, and how to check their status.

→ Part 4: Container Lifecycle

Docker for Beginners Part 3 — Writing a Dockerfile

Table of contents