Skip to content
ioob.dev
Go back

Docker for Beginners Part 5 — Volumes and Data Persistence

· 6 min read
Docker Series (5/13)
  1. Docker for Beginners Part 1 — What Is Docker
  2. Docker for Beginners Part 2 — Images and Layers
  3. Docker for Beginners Part 3 — Writing a Dockerfile
  4. Docker for Beginners Part 4 — Container Lifecycle
  5. Docker for Beginners Part 5 — Volumes and Data Persistence
  6. Docker for Beginners Part 6 — Networking
  7. Docker Part 7 — Multi-Container Orchestration with Docker Compose
  8. Docker Part 8 — Slimming Images with Multi-Stage Builds
  9. Docker Part 9 — Registry: Where Do Images Live?
  10. Docker Part 10 — Container Security: Blocking Issues Before They Blow Up
  11. Docker Part 11 — BuildKit and Advanced Builds
  12. Docker Part 12 — Production Best Practices
  13. Docker Part 13 — Troubleshooting and Alternatives
Table of contents

Table of contents

Containers Die, but Data Must Survive

Recall the container lifecycle from Part 4. Containers are recreated at any time — during deployments, server reboots, and even auto scale-outs. Throughout this process, the container’s internal filesystem is entirely destroyed.

So what about DB data? Uploaded files? Caches? All gone. That is why Docker’s fundamental pattern is to keep data “outside the container.” This is called a volume.

flowchart LR
    subgraph C1["Container A (v1)"]
        APP1["App process"]
        FS1["Container filesystem<br/>(ephemeral)"]
    end

    subgraph C2["Container B (v2, replaced)"]
        APP2["App process"]
        FS2["Container filesystem<br/>(ephemeral)"]
    end

    VOL[("Volume<br/>(persistent)")]

    APP1 -.mount.-> VOL
    APP2 -.mount.-> VOL

Even when a container is replaced from v1 to v2, the volume stays in the same place. The new container mounts the same volume and picks up where the previous data left off. Volumes outlive containers.

Three Mount Types

Docker provides three ways to attach data to a container:

TypeDescriptionPrimary Use
Named volumeStorage managed by DockerDB, app data, operational data
Bind mountDirectly mounts an arbitrary host directoryCode sharing during development, log collection
tmpfsTemporary storage that exists only in memorySensitive temporary data

Each has a clearly different use case. Let’s examine them one by one.

Named Volume — Docker Manages It for You

This is the most recommended approach. Docker stores data in its own directory (/var/lib/docker/volumes/...) and references it by name.

# Create a volume
docker volume create pgdata

# Run a container with the volume mounted
docker run -d \
  --name postgres \
  -e POSTGRES_PASSWORD=secret \
  -v pgdata:/var/lib/postgresql/data \
  postgres:16

# Check volumes
docker volume ls
# DRIVER    VOLUME NAME
# local     pgdata

docker volume inspect pgdata
# [{ "Mountpoint": "/var/lib/docker/volumes/pgdata/_data", ... }]

-v pgdata:/var/lib/postgresql/data means “attach the volume named pgdata to /var/lib/postgresql/data inside the container.” The first part is the volume name, the second part is the path inside the container.

If you delete the container and create a new one that mounts the same volume, the data is still there:

docker rm -f postgres
# Reconnect with a new container
docker run -d --name postgres -e POSTGRES_PASSWORD=secret \
  -v pgdata:/var/lib/postgresql/data \
  postgres:16
# Previous DB contents are still intact
  1. Not tied to a host path. Unaffected even if the host directory path changes
  2. Simpler permission management. Automatically initialized to match the image’s UID/GID (on first mount)
  3. Easy to backup/migrate at the Docker level. Consistently managed via docker volume commands
  4. Swappable drivers. Can plug in not just local volumes but also NFS, AWS EBS, and other plugins
  5. Less OS filesystem dependency. Much faster than bind mounts on macOS/Windows Docker Desktop

For data owned by a container — such as DBs, message queues, and caches — use named volumes. This is the fundamental principle.

Bind Mount — Directly Plugging In a Host Directory

This mounts a specific path from the host filesystem into the container. The container and host see the same files.

# Mount the host's current directory to /app in the container
docker run -d --name dev \
  -v $(pwd):/app \
  -p 3000:3000 \
  node:22-slim node /app/server.js

Thanks to mounting $(pwd), editing code on the host is immediately reflected in the container. This is incredibly convenient in development environments. Strictly speaking, this is the primary use case for bind mounts.

Bind mounts should be used with caution in production. Here is why:

Cases where bind mounts are justified in production usually include:

# Read-only bind mount example
docker run -d --name nginx \
  -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro \
  -p 80:80 \
  nginx:1.27

:ro means read-only. It prevents the container from accidentally overwriting the config file.

—mount — The More Explicit Sibling of -v

There is another syntax that does the same thing. --mount is more verbose but less ambiguous than -v.

# -v style
docker run -v pgdata:/var/lib/postgresql/data postgres:16

# --mount style
docker run --mount type=volume,source=pgdata,target=/var/lib/postgresql/data postgres:16

# Bind mount as well
docker run --mount type=bind,source=$(pwd),target=/app,readonly node:22-slim

-v has concise syntax but the type distinction is not intuitive (it infers volume vs bind from the path format). --mount explicitly specifies type=..., reducing typos and mistakes. The trend for scripts and CI is to prefer --mount.

tmpfs — Never Hits Disk

Temporary storage that exists only in memory. It disappears when the container exits.

docker run -d --name app \
  --tmpfs /tmp:size=64m \
  myapp

This option is used in two scenarios:

  1. Performance: Fast temporary storage. Useful for caches at /tmp
  2. Security: Sensitive data (sessions, tokens, etc.) that must not persist on disk

Most applications do not need to worry about tmpfs. It is an option you reach for only when needed.

UID/GID Permission Issues — The Most Common Beginner Pitfall

When using volumes, you will inevitably encounter mysterious Permission denied errors. This is usually caused by a UID/GID mismatch.

Let’s reproduce the situation. Create a directory owned by the current user (UID 1000) on the host, and attempt to write to that directory as a different UID inside the container:

mkdir -p ./data
sudo chown 1000:1000 ./data

# PostgreSQL runs as UID 70 (or 999)
docker run --rm \
  -v $(pwd)/data:/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=secret \
  postgres:16
# initdb: error: could not change permissions of directory ...

The process inside the container runs as the postgres user (e.g., UID 999), but the host directory is owned by UID 1000. Write permission is denied.

There are several solutions:

1. Use Named Volumes (simplest)

When a named volume is first mounted, Docker sets up permissions to match the image’s default user. This is one of the key reasons why named volumes are recommended.

docker run -d --name db \
  -v pgdata:/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=secret \
  postgres:16
# Starts without issues

2. Match host permissions to the image’s UID/GID when using bind mounts

# Check the UID of the postgres image
docker run --rm postgres:16 id postgres
# uid=999(postgres) gid=999(postgres)

sudo chown -R 999:999 ./data

3. Match the UID at container runtime to the host

docker run --rm \
  --user $(id -u):$(id -g) \
  -v $(pwd)/data:/app/data \
  myapp

--user overrides the Dockerfile’s USER directive. Note that even though the corresponding UID does not need a registered user inside the image, some programs require a “registered user” and will error out.

4. Fix permissions during the init phase (image author’s choice)

A common pattern is to include an entrypoint script that runs chown just once when the data volume is first mounted:

#!/bin/sh
# entrypoint.sh
chown -R app:app /data
exec gosu app "$@"

gosu is an alternative to su that better forwards signals. Official images commonly use this approach.

Core principle: Use named volumes for production data. If you must use bind mounts, match the host directory’s UID/GID to the image.

Volume Backup and Restore

Named volumes are managed by Docker, but they ultimately live somewhere on the host filesystem. Backing them up is not difficult.

Dumping a Volume to tar

docker run --rm \
  -v pgdata:/source:ro \
  -v $(pwd):/backup \
  alpine \
  tar czf /backup/pgdata-$(date +%Y%m%d).tar.gz -C /source .

Breaking down what happens:

  1. Spin up a temporary alpine container
  2. Mount the volume to back up (pgdata) at /source as read-only
  3. Mount the host’s current directory at /backup
  4. Tar up the contents of /source and save it to /backup

The container is automatically deleted with --rm when finished. A pgdata-YYYYMMDD.tar.gz file remains on the host.

Restore

# Prepare a new volume
docker volume create pgdata-restore

docker run --rm \
  -v pgdata-restore:/target \
  -v $(pwd):/backup \
  alpine \
  sh -c "cd /target && tar xzf /backup/pgdata-20260420.tar.gz"

# Run DB with the new volume
docker run -d --name pg-restored \
  -v pgdata-restore:/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=secret \
  postgres:16

For stateful services like databases, using each DB’s native backup tool (pg_dump, mongodump, etc.) is safer than tar file dumps. Copying files from a running instance can leave transaction intermediate states. The proper approach for tar dumps is to stop the container first to ensure a consistent state.

Volume Cleanup

Unused volumes silently consume disk. Check periodically:

# Find volumes not connected to any container
docker volume ls --filter dangling=true

# Clean up all at once
docker volume prune

# Delete a specific volume
docker volume rm old-cache

docker volume prune has no safety net. To avoid accidentally destroying important volumes, make a habit of checking with docker volume ls --filter dangling=true before cleanup.

Volumes in Compose

Managing volumes with individual docker run commands becomes increasingly cumbersome. docker compose organizes this declaratively:

# docker-compose.yml
services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro

volumes:
  pgdata:

Named volumes are defined in the volumes section and referenced by services. ./init.sql is a bind mount. Compose internally creates volumes prefixed with the project name, like myproject_pgdata.

docker compose down only removes containers. Volumes are preserved. To delete volumes as well, you must explicitly use docker compose down -v. This behavior serves as a safety net for data protection.

Summary: Criteria for Choosing a Volume Type

Finally, let’s summarize “which mount to use when” in a single diagram:

flowchart TD
    START["Need to put data in a container"] --> Q1{"Production data?<br/>(DB, file uploads, etc.)"}
    Q1 -->|Yes| NV["Named volume"]
    Q1 -->|No| Q2{"Directly share<br/>host files?"}
    Q2 -->|Yes| BM["Bind mount<br/>(dev code, config, logs)"]
    Q2 -->|No| Q3{"Must not<br/>persist on disk?"}
    Q3 -->|Yes| TM["tmpfs"]
    Q3 -->|No| NV2["Named volume<br/>(default)"]

When in doubt, use a named volume. Use bind mounts only when the intent is clear. tmpfs is limited to special performance/security purposes.


In the next part, we move on to how containers communicate with each other and with the outside world. The differences between bridge, host, overlay, and none drivers, how DNS resolves container names, the internals of port forwarding, and the advantages of user-defined networks.

Part 6: Networking


Related Posts

Share this post on:

Comments

Loading comments...


Previous Post
Docker for Beginners Part 4 — Container Lifecycle
Next Post
Docker for Beginners Part 6 — Networking