Table of contents
- An Image Is Not a Blob — It Is a Stack
- How Layers Stack Up
- Union Filesystem — Making the Stack Look Like One
- The Same Layer Is Never Downloaded Twice
- Layer Hashes and Image Identification
- The Flow of Pull and Push
- Developing a Sense for Reducing Image Size
- Dangling Images and Cleanup
- Three Key Intuitions for Images and Layers
An Image Is Not a Blob — It Is a Stack
When you first work with Docker images, they feel like a single blob of files. You run docker pull nginx, something gets downloaded, and docker run starts it. Seems simple.
But if you watch the download process closely, something interesting happens:
docker pull node:22
# 22: Pulling from library/node
# 4f4fb700ef54: Pull complete
# 6b9d8d31e2b4: Pull complete
# a75df4d1b5b5: Pull complete
# ...
Multiple IDs appear. Pull complete is printed separately for each one. These are layers. An image is not a single blob but rather multiple layers stacked on top of each other. And this structure enables many of the conveniences in the Docker ecosystem — fast pulls, fast builds via caching, and space savings.
How Layers Stack Up
A single image is a stack of read-only layers. Each layer represents a diff from the previous layer.
flowchart TB
L0["Layer 0: Debian base<br/>(base filesystem)"] --> L1
L1["Layer 1: apt package installation"] --> L2
L2["Layer 2: Node.js installation"] --> L3
L3["Layer 3: /app directory copy"] --> L4
L4["Layer 4: npm install result"] --> RW
RW["Writable layer added<br/>at container runtime (RW Layer)"]
When an image is actually run as a container, one additional writable layer (RW Layer) is placed on top. Any files created or modified inside the container go into this RW layer. When the container is deleted, the RW layer disappears with it.
All actual content of the image (the read-only layers) is shared across containers. Even if you spin up 100 containers from the same image, only one copy of each read-only layer exists on disk. Each container only has its own separate RW layer. That is why containers are lightweight.
Union Filesystem — Making the Stack Look Like One
Even though there are multiple layers, inside the container they appear as a single filesystem. What makes this possible is the Union Filesystem. The most prominent implementation is OverlayFS, which became the standard since Linux 4.x.
OverlayFS stacks multiple “lowerdirs” at the bottom and places a single “upperdir” on top. When a container process reads a file, it searches from top to bottom and returns the first version found. When a file is modified, the original is copied to the upperdir, and the modification is made there (Copy-on-Write, CoW).
If you look inside /var/lib/docker/overlay2/ on the host, you can see the actual layer directories:
sudo ls /var/lib/docker/overlay2/ | head
# 3b4c8f2a1d.../
# 7a9f3b2e5c.../
# ...
# Each directory is one layer
This structure matters for two reasons:
- Storage savings: Different images that use the same base image share their lower layers
- Network savings: Layers already downloaded are not downloaded again. Pulls become dramatically faster
The Same Layer Is Never Downloaded Twice
Let’s see this in action. We’ll pull a Python image and a Python-based app image:
docker pull python:3.12-slim
# 3.12-slim: Pulling from library/python
# 4f4fb700ef54: Pull complete <- debian:slim base layer
# 1e1a1a1b1c1d: Pull complete <- python runtime layer
# ...
docker pull my-registry/my-app:latest
# latest: Pulling from my-app
# 4f4fb700ef54: Already exists <- same debian:slim layer reused!
# 1e1a1a1b1c1d: Already exists
# 3c3d3e3f...: Pull complete <- only app-specific layer is newly downloaded
The key is the Already exists message during the second pull. If the layer hashes match, Docker determines “this already exists, no need to download.” If dozens of services on a team share the same base image, disk and network savings per server are enormous.
Layer Hashes and Image Identification
Each layer gets a SHA256 hash based on its content. Same content means same hash. A single byte difference means a different hash.
You can verify this with your own eyes:
docker image inspect nginx:1.27 --format '{{json .RootFS.Layers}}' | jq
# [
# "sha256:3b43a6502abb5...",
# "sha256:f4df7a8b2c...",
# ...
# ]
This list of hashes is the image’s identity. A tag (nginx:1.27) is merely an alias pointing to this bundle of hashes. Even if you change the tag, if the content is the same, the data on disk stays the same. Conversely, a tag like nginx:latest can point to a different image hash over time.
This is why you are told not to use the :latest tag in production. The same tag can point to different content from one day to the next. If you want to pin a version, use a specific tag (1.27) or a digest (@sha256:...).
# Pin an image by digest — never changes
docker pull nginx@sha256:3b43a6502abb5...
The Flow of Pull and Push
Let’s see exactly what happens when images are sent and received:
sequenceDiagram
participant C as docker CLI
participant D as Docker Daemon
participant R as Registry
Note over C,R: docker pull nginx:1.27
C->>D: /images/create
D->>R: GET /v2/library/nginx/manifests/1.27
R-->>D: manifest (list of layer hashes)
D->>D: Check which layers exist locally
par Parallel download
D->>R: GET /v2/library/nginx/blobs/sha256:aaa...
D->>R: GET /v2/library/nginx/blobs/sha256:bbb...
D->>R: GET /v2/library/nginx/blobs/sha256:ccc...
end
R-->>D: Each layer as tar.gz
D->>D: Store locally keyed by hash
D-->>C: Pull complete
The sequence in summary:
- The Daemon fetches a manifest from the registry. A manifest is metadata saying “this image consists of these layers”
- Each layer hash is looked up in local storage. If it already exists, it is skipped
- Only missing layers are downloaded in parallel. Each layer is independent, so order does not matter
- Downloaded layers are stored keyed by hash, so that the next request for the same hash can be served immediately
Push works in the reverse direction but follows the same logic. The registry asks “do you already have a layer with this hash?” — existing ones are skipped, and only new ones are uploaded. So even if you change a single line of your app and deploy, most layers are reused, and only the changed upper layers are transferred.
Developing a Sense for Reducing Image Size
When first using Docker, images easily balloon into the GB range. node:22 is 1GB, and adding an app on top often hits 2GB. This number can be reduced.
1. Choose a smaller base image
Even for the same Node.js runtime, the size varies depending on which base you use:
| Image | Approximate Size |
|---|---|
node:22 | 1.1GB |
node:22-slim | 240MB |
node:22-alpine | 180MB |
gcr.io/distroless/nodejs22 | 170MB |
slimis a Debian-based image with unnecessary documentation and development tools removedalpineis an ultra-lightweight Linux based onmusl libc. The downside is subtle differences fromglibcdistrolessis a minimal environment without even a shell. Advantageous for security but harder to debug
In most cases, slim is the practical default. As you gain experience, you can graduate to alpine or distroless.
2. Separate “frequently changing” and “rarely changing” layers
In a Dockerfile, arrange the order so that upper layers rarely change and lower layers change often. This way, upper layers are cached and reused during builds.
Bad example:
FROM node:22-slim
COPY . /app # Cache invalidated every time code changes
WORKDIR /app
RUN npm install # Re-runs every time
CMD ["node", "index.js"]
Good example:
FROM node:22-slim
WORKDIR /app
COPY package*.json ./ # Copy dependency files first
RUN npm ci # Dependency layer is cached if package.json hasn't changed
COPY . . # Code changes only affect the last layer
CMD ["node", "index.js"]
If package.json has not changed, npm ci is not re-executed. A build that used to take a minute finishes in seconds even if a single line of code changes. Dockerfile instruction order directly impacts performance. This is covered more thoroughly in Part 3.
3. Combine multiple commands in one layer and clean up in the same step
Each RUN creates a new layer. So if you write it like this, the “file creation” layer and the “file deletion” layer are separate, meaning files that have already been deleted are still included in the final image:
RUN apt-get update && apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*
# The second RUN deletes files, but they still exist in the first layer
You need to combine them into a single RUN so that installation and cleanup happen in the same layer:
RUN apt-get update \
&& apt-get install -y --no-install-recommends curl \
&& rm -rf /var/lib/apt/lists/*
--no-install-recommends prevents recommended packages from being installed. rm -rf /var/lib/apt/lists/* clears the apt cache. This cleanup must happen within the same RUN for the final image to stay lean.
4. Use multi-stage builds to strip out build tools
Languages like Go and Java need a compiler at build time, but only the build output is needed at runtime. Multi-stage builds cleanly separate the two:
# Stage 1: Build
FROM golang:1.23 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /out/app ./cmd/server
# Stage 2: Run
FROM gcr.io/distroless/static-debian12
COPY --from=builder /out/app /app
ENTRYPOINT ["/app"]
FROM ... AS builder creates a build-only stage, and the final stage copies only the output. The final image has no Go compiler. It is common for images to shrink from several hundred MB to just a few MB.
Dangling Images and Cleanup
When you rebuild an image, the same tag gets reassigned to the new image and the previous image is left with no tag (<none>:<none>). This is called a dangling image. If they pile up, they can consume significant disk space.
docker images --filter "dangling=true"
# REPOSITORY TAG IMAGE ID CREATED
# <none> <none> 1a2b3c4d... 10 minutes ago
docker image prune # Remove only dangling images
docker image prune -a # Remove all unused images
docker system prune -a --volumes # Clean up images/containers/networks/volumes entirely
docker system prune is powerful, so use it carefully. In CI environments, running it periodically helps prevent disk exhaustion.
Three Key Intuitions for Images and Layers
To tie together images and layers in one go, here are three key intuitions:
- An image is a stacked diff. Each layer is a change set from the previous state, identified by a hash, and shared across images
- Layer caching is determined by Dockerfile instruction order. Put things that rarely change at the top and things that change often at the bottom
- The final image size is determined by base image choice, layer bundling, and multi-stage builds. These three techniques solve most size problems
In the next part, we dig into the place where layers are actually created — the Dockerfile. From the differences between FROM, RUN, COPY, CMD, and ENTRYPOINT, to how to use ARG and ENV, and how to reduce the build context with .dockerignore.

Loading comments...