Docker Part 8 — Slimming Images with Multi-Stage Builds

The Problem with Single-Stage
The Basic Multi-Stage Pattern
What Stages Do During a Build
Language-Specific Patterns
Caching Layers — Order Equals Speed
BuildKit Cache Mounts
distroless and scratch — Minimal Bases
Build Stage Targeting
Practical Checklist
Where We Stand

The Problem with Single-Stage

Let’s look at a Go example. A simple single-stage Dockerfile looks like this:

FROM golang:1.22-alpine
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /app/bin/server ./cmd/server
CMD ["/app/bin/server"]

Building this means the final image includes all of golang:1.22-alpine — the Go compiler, standard library source, all dependency source code, and the go command. At runtime, only the compiled binary needs to run.

docker build -t app:single .
docker images app:single
# REPOSITORY   TAG      SIZE
# app          single   ~350MB

Around 350MB. The binary itself is only 10-20MB.

The Basic Multi-Stage Pattern

Use FROM multiple times. Each FROM is a new stage. Only the last stage is saved as the actual image; the rest are discarded.

# Stage 1 — Build
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /out/server ./cmd/server

# Stage 2 — Runtime
FROM alpine:3.20
RUN adduser -D -u 10001 app
COPY --from=builder /out/server /usr/local/bin/server
USER app
ENTRYPOINT ["/usr/local/bin/server"]

The first stage is named AS builder, and the second stage uses COPY --from=builder to grab only the binary. The runtime image has no Go compiler at all.

docker build -t app:multi .
docker images app:multi
# REPOSITORY   TAG      SIZE
# app          multi    ~15MB

350MB to 15MB. Splitting the stages alone reduced the size by over 95%.

What Stages Do During a Build

Visualized as a diagram, the flow is straightforward:

flowchart LR
    SRC["Source code"] --> B1["Stage: builder<br/>(golang:1.22)"]
    B1 --> ART["/out/server<br/>(compiled binary)"]
    ART -->|COPY --from=builder| B2["Stage: runtime<br/>(alpine:3.20)"]
    B2 --> IMG["Final image"]
    B1 -.->|Discarded| X["Builder layers"]

The builder stage is full of compilers and build tools, but they do not remain in the final image. Only the copied artifacts and the runtime base remain.

Language-Specific Patterns

The core of multi-stage builds is separating build tools from runtime, but each language has its own nuances.

Java (Gradle)

FROM gradle:8.7-jdk21 AS builder
WORKDIR /src
COPY build.gradle.kts settings.gradle.kts ./
COPY src ./src
RUN gradle clean bootJar --no-daemon

FROM eclipse-temurin:21-jre-alpine
WORKDIR /app
COPY --from=builder /src/build/libs/*.jar app.jar
ENTRYPOINT ["java", "-jar", "/app/app.jar"]

gradle:8.7-jdk21 includes JDK + Gradle + dependency cache, weighing several hundred MB. The runtime only needs jre-alpine, so the image is much lighter. With Spring Boot, you could further split layers with layertools, but multi-stage alone already cuts things down by more than half.

Node.js

FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build && npm prune --production

FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./
USER node
CMD ["node", "dist/server.js"]

Node’s node_modules is particularly bloated, so separating the install into a deps stage lets the layer cache hit when package.json has not changed. Running npm prune --production after the build strips devDependencies, further reducing the final size.

Python

FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "-m", "myapp"]

Python is not a compiled language, so the gains are smaller than other languages. Still, being able to exclude compilers and dev headers (gcc, libpq-dev, etc.) from the runtime is a meaningful win.

Caching Layers — Order Equals Speed

Each Dockerfile instruction creates a layer. If the input to that layer has not changed, the cache is hit and it is reused without rebuilding. Understanding this caching rule alone can cut CI time in half.

flowchart TB
    L1["COPY go.mod go.sum"] --> L2["RUN go mod download"]
    L2 --> L3["COPY . ."]
    L3 --> L4["RUN go build"]

    subgraph KEY["Cache hit criteria"]
      L1
      L2
      L3
      L4
    end

    NOTE1["When only dependencies change\n→ Only L1, L2 re-execute"]
    NOTE2["When a single source line changes\n→ Only L3, L4 re-execute"]

The key order is: copy dependency files first, install them, then copy the source. Since changing a single line of source is the most frequent operation, those should be in the lower layers.

Here is what happens with the reverse order:

# Bad order — even a single source line change reinstalls dependencies
COPY . .
RUN go mod download
RUN go build -o /out/server ./cmd/server

With this order, go mod download re-runs every time source changes. This pattern is the usual culprit when team-wide CI build times are inflated by several minutes.

BuildKit Cache Mounts

With BuildKit as the default engine, a more powerful caching mechanism became available. Build tool cache directories can be maintained in mount form across builds.

# syntax=docker/dockerfile:1.7
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
    go mod download
COPY . .
RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 go build -o /out/server ./cmd/server

--mount=type=cache persists build tool cache directories across builds, independent of layer cache hits. Even on a clean build, Go module caches can be reused. This feature is covered in greater depth in Part 11.

distroless and scratch — Minimal Bases

Alpine is small, but there is room to go even smaller. For static binaries, you can target scratch or gcr.io/distroless/*.

scratch — Truly Nothing

FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -ldflags='-s -w' -o /out/server ./cmd/server

FROM scratch
COPY --from=builder /out/server /server
ENTRYPOINT ["/server"]

scratch is an empty image. No shell, no libc. A fully static binary built with CGO_ENABLED=0 in Go works on it. However, docker exec ... sh is impossible, making debugging difficult.

distroless — Bare Minimum Runtime

FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o /out/server ./cmd/server

FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /out/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]

distroless is a set of images from Google containing “only the bare minimum needed to run apps.” Without a shell, the attack surface is narrow, and the nonroot tag starts as a non-root user. static-debian12 is for static binaries like Go, while java-debian12 is the JRE-included version, and so on.

Comparison

Base	Size	Shell	libc	Use Case
ubuntu:22.04	~77MB	yes	glibc	General purpose
debian:12-slim	~75MB	yes	glibc	General purpose
alpine:3.20	~7MB	yes	musl	Lightweight
distroless/base	~20MB	no	glibc	Runtime
distroless/static	~2MB	no	none	Static binaries
scratch	0MB	no	none	Static binaries

The size numbers are approximations that may vary by a few MB. The point is that “keep only the minimum needed” is the answer.

Build Stage Targeting

In multi-stage builds, you can build only a specific stage. This is useful for patterns where you have a separate test stage:

FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o /out/server ./cmd/server

FROM builder AS tester
RUN go test ./...

FROM alpine:3.20 AS runtime
COPY --from=builder /out/server /server
ENTRYPOINT ["/server"]

In CI, use --target tester to run tests only, and build the deployment image with --target runtime:

docker build --target tester -t app:test .
docker build --target runtime -t app:prod .

Practical Checklist

Build tools/SDK go in the builder stage; runtime uses slim/alpine/distroless
COPY dependency files before source to preserve cache layers
Add node_modules, .git, test resources, etc. to .dockerignore to reduce build context
Separate build args/secrets to prevent sensitive information from entering the image (covered in Parts 10 and 11)
Specify a non-root USER for production (covered in Part 10)
Periodically clean up unused intermediate layers with docker builder prune

Where We Stand

Now you can choose whether to wrap the same app as a massive 350MB image or a lean 15MB image. Smaller images mean faster deployments, lower network costs, and a reduced attack surface.

In the next part, we cover where to put these images — the registry. From Docker Hub to ECR/GCR/ACR and self-hosted Harbor — authentication and tagging strategies from a practical perspective.

→ Part 9: Registry