Skip to content
ioob.dev
Go back

Docker Part 13 — Troubleshooting and Alternatives

· 7 min read
Docker Series (13/13)
  1. Docker for Beginners Part 1 — What Is Docker
  2. Docker for Beginners Part 2 — Images and Layers
  3. Docker for Beginners Part 3 — Writing a Dockerfile
  4. Docker for Beginners Part 4 — Container Lifecycle
  5. Docker for Beginners Part 5 — Volumes and Data Persistence
  6. Docker for Beginners Part 6 — Networking
  7. Docker Part 7 — Multi-Container Orchestration with Docker Compose
  8. Docker Part 8 — Slimming Images with Multi-Stage Builds
  9. Docker Part 9 — Registry: Where Do Images Live?
  10. Docker Part 10 — Container Security: Blocking Issues Before They Blow Up
  11. Docker Part 11 — BuildKit and Advanced Builds
  12. Docker Part 12 — Production Best Practices
  13. Docker Part 13 — Troubleshooting and Alternatives
Table of contents

Table of contents

The Big Picture of Diagnosis

Here is a flow for deciding where to start when an incident occurs:

flowchart TB
    START["Container anomaly"] --> Q1{"Visible in<br/>docker ps?"}
    Q1 -->|No| NOEXIST["docker ps -a<br/>Check exit code"]
    Q1 -->|Yes| Q2{"State is<br/>Running?"}
    Q2 -->|Restarting| LOGS["docker logs<br/>+ events"]
    Q2 -->|Running| Q3{"Are requests<br/>getting through?"}
    Q3 -->|No| NET["inspect network<br/>+ check ports"]
    Q3 -->|Slow/errors| STATS["stats / top<br/>Check resources"]
    NOEXIST --> EXITC{"Exit Code"}
    EXITC -->|0| NORMAL["Normal exit"]
    EXITC -->|1| APP["App error"]
    EXITC -->|125| CLI["docker command error"]
    EXITC -->|126| PERM["Not executable (permission)"]
    EXITC -->|127| NOT["Command not found"]
    EXITC -->|137| OOM["SIGKILL (likely OOM)"]
    EXITC -->|143| TERM["SIGTERM (normal shutdown path)"]

Just looking at the exit code narrows down the cause by half. Let’s go through them.

Common Exit Code Interpretation

Exit codeMeaningCommon Cause
0Normal exitApp finished as intended
1Application-level errorCheck stack trace
125Docker command itself failedOption/image name typo
126Command inside container not executablePermission denied, missing execute bit
127Command not foundPATH issue, binary not present
137Killed by SIGKILLOOM killer or liveness failure
139SIGSEGVNative crash
143Killed by SIGTERMOrchestrator’s normal shutdown path

The most misunderstood is 137. It is easy to immediately conclude “it’s OOM,” but in reality, Kubernetes liveness failure where kubelet sends SIGKILL also produces 137. If there is no OOM message in logs and the liveness config is strict, suspect the probe first.

Verification commands:

docker ps -a --format 'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}'
docker inspect <container> --format '{{.State.ExitCode}} {{.State.OOMKilled}} {{.State.Error}}'

If OOMKilled: true, it is a confirmed memory exceeded.

Reading Logs

docker logs shows the container’s stdout/stderr as-is. Use -f to follow in real time.

docker logs <container>
docker logs -f <container>
docker logs --tail 200 <container>
docker logs --since 10m <container>
docker logs --timestamps <container>

With Compose, use the service name:

docker compose logs -f web
docker compose logs --tail 100 db

One important note: if the app only writes logs to a file and not stdout, docker logs shows nothing. You need to configure the app to output to standard out (Node: console.log, Python: sys.stdout, Spring Boot: remove logging.file.name config, etc.).

inspect — All the Metadata

Extracts detailed information as JSON for containers, images, networks, and volumes.

# Container state
docker inspect <container>

# Specific fields only
docker inspect <container> --format '{{.State.Status}}'
docker inspect <container> --format '{{.NetworkSettings.IPAddress}}'
docker inspect <container> --format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{"\n"}}{{end}}'

# Image
docker inspect <image>

# Network
docker network inspect bridge

Fields that quickly narrow down issues:

events — What Happened in Chronological Order

docker events is Docker Daemon’s real-time event stream. Useful when you want to trace why a container died in chronological order.

# Real-time
docker events

# Filtered
docker events --filter 'event=die' --filter 'event=kill'

# Past events
docker events --since '1h' --until '10m'

Example output:

2026-04-20T12:34:56 container die 5e9a...(image=myapp:1.4.2, exitCode=137)
2026-04-20T12:34:56 container oom 5e9a...

If an oom event appears alongside, the OOM killer is confirmed.

stats — Real-Time Resources

stats shows per-container CPU/memory/network/IO in real time.

docker stats
docker stats <container> --no-stream

--no-stream prints once and exits. Useful for CI/monitoring scripts. If memory usage is approaching the limit, it is an OOM precursor.

top — Processes Inside the Container

docker top shows the process list inside a container.

docker top <container>
docker top <container> auxf     # Tree format

If PID 1 shows as sh -c ..., suspect the signal delivery issue discussed in Part 12. If there are more child processes than expected, also check for fork-related issues.

exec — Getting Inside the Container

Probably the most-used diagnostic command.

docker exec -it <container> sh
docker exec -it <container> bash     # For images that have bash
docker exec <container> cat /proc/1/status
docker exec <container> env

For distroless or other shell-less images, you cannot attach a shell via exec. In that case, spin up a separate network debug container and attach it to the same network namespace:

docker run -it --rm \
  --network container:<container> \
  --pid container:<container> \
  nicolaka/netshoot

nicolaka/netshoot is a debugging image packed with tools like curl, dig, tcpdump, and strace. Because it shares the network/PID namespace, you can inspect the target container’s ports and processes directly.

Common Error Patterns and Solutions

1. permission denied — Volume Permissions

open /app/logs/app.log: permission denied

Cause: A host directory is bind-mounted, but the non-root user UID inside the container does not match the host file owner UID.

Solution:

2. Error response from daemon: pull access denied

pull access denied for registry.example.com/myapp

Cause: Not logged into the registry, token expired, or repository path typo.

Solution:

docker login registry.example.com
docker pull registry.example.com/myapp:1.4.2

For ECR, the token is 12-hour, so it must be refreshed at the start of each CI run.

3. address already in use

bind: address already in use

Cause: The port on the host is already being used by another process.

Solution:

lsof -i :8080     # Linux/macOS
# or
netstat -anp | grep 8080

# Find conflicting container
docker ps --filter "publish=8080"

4. no space left on device

Cause: The host disk is full, or Docker’s internal storage directory (/var/lib/docker) is full.

# Cleanup
docker system df                 # Check capacity
docker system prune              # Clean stopped containers, unused networks/images
docker system prune -a --volumes # Above + unreferenced images + volumes (caution!)
docker builder prune             # Clean build cache only

A CI job that periodically runs builder prune keeps things stable over time.

5. CrashLoopBackOff (Kubernetes)

The container exits immediately after starting. kubelet keeps restarting it while the backoff grows exponentially.

kubectl describe pod <pod>           # Events, last exit code
kubectl logs <pod> -c <container>
kubectl logs <pod> -c <container> --previous  # Previous instance logs

--previous is the decisive flag. The current container is already dead with empty logs, but the previous instance’s logs reveal why it died.

Docker Alternatives

Docker is the de facto standard, but it is not the only option. Understanding why alternatives exist enables informed choices for each situation.

flowchart TB
    subgraph OCI["OCI Standards"]
      OCI_SPEC["OCI Image Spec<br/>OCI Runtime Spec<br/>OCI Distribution Spec"]
    end

    subgraph TOOLS["High-level tools"]
      DOCKER["Docker (dockerd)"]
      PODMAN["Podman"]
      NERDCTL["nerdctl"]
    end

    subgraph RUNTIME["Low-level runtimes"]
      CONTAINERD["containerd"]
      CRIO["CRI-O"]
      RUNC["runc"]
    end

    DOCKER --> CONTAINERD
    NERDCTL --> CONTAINERD
    PODMAN --> RUNC
    CONTAINERD --> RUNC
    CRIO --> RUNC

    OCI_SPEC -.standard.-> DOCKER
    OCI_SPEC -.standard.-> PODMAN
    OCI_SPEC -.standard.-> CONTAINERD

The key takeaway from this diagram is that images and runtimes are standardized. Images built with Docker can run on Podman or containerd as-is. Even if you switch tools, images remain compatible.

Podman — Daemonless + Rootless

Podman is an alternative led by Red Hat. Command compatibility is high (alias docker=podman works in many cases), with two key differences:

  1. No daemon: Docker has the always-running dockerd daemon, with the CLI calling it. Podman has each command directly starting containers. It is friendly with SystemD (podman generate systemd).
  2. Rootless by default: Running podman as a regular user leverages user namespaces to run containers without root. This reduces the attack surface.
# Most Docker commands work as-is
podman pull alpine:3.20
podman run -it alpine:3.20 sh
podman ps
podman build -t myapp:1.4.2 .

# Auto-generate SystemD units
podman generate systemd --name myapp > myapp.service

However, Compose compatibility is less smooth. A separate podman-compose tool exists, and recently podman compose has been integrated, but it is not perfectly compatible.

When to choose Podman:

containerd — Kubernetes Standard Runtime

containerd is a low-level runtime originally separated from Docker. Docker uses containerd, and Kubernetes also uses containerd as its default CRI. Since Kubernetes 1.24 removed Docker Shim, node runtimes are mostly containerd or CRI-O.

# containerd uses nerdctl instead of its own CLI
nerdctl pull alpine:3.20
nerdctl run -it alpine:3.20 sh
nerdctl build -t myapp:1.4.2 .
nerdctl compose up -d

nerdctl provides an interface nearly identical to the Docker CLI. With BuildKit integration, Compose support, and even image encryption, it is essentially “Docker without the Docker CLI.”

When you encounter containerd:

# On a Kubernetes node
crictl ps
crictl logs <container ID>
crictl exec -it <container ID> sh

CRI-O

CRI-O is a Kubernetes-only runtime. OpenShift uses it as the default. Its philosophy is to deliberately leave out features not needed by Kubernetes, making it simple and lightweight. Regular developers rarely use it directly — it falls under the platform team’s scope.

Migration Considerations

Main things to check when moving from Docker to Podman or containerd-based setups:

  1. Compose compatibility: docker compose files may not run as-is on Podman. Latest features like depends_on.condition, healthcheck, and profiles vary by tool version
  2. Docker socket dependency: If CI runners or observability tools depend on /var/run/docker.sock, you need to switch to Podman’s podman.sock or containerd’s containerd.sock
  3. User namespace mapping: In rootless mode, UID mapping differs, potentially introducing new volume permission issues
  4. Build commands: Just as docker build is BuildKit-based, Podman’s buildah and nerdctl’s BuildKit integration are similar but not perfectly compatible
  5. Images are compatible: As shown in the diagram above, images follow the OCI standard and work across tools. This is not the biggest concern

Wrapping Up the Docker Series

This concludes the 13-part journey. Starting from what a container is in Part 1, through images/networking/volumes fundamentals, Dockerfile, Compose, security and optimization, to production operations. Each part builds the foundation for the next. The healthcheck used in Part 7 Compose extends to probes in Part 12, and the secrets from Part 10 security are implemented with --mount=type=secret in Part 11 BuildKit.

Docker is not just a single tool — it is the gateway to the container ecosystem. The instincts built here apply whether you go to Kubernetes, Podman, or containerd. Images and namespaces, control groups and runtimes — the same building blocks, just different packaging.

The next step from this series depends on your needs. If orchestration is needed, go to Kubernetes. If security is the focus, explore Podman and image signing. For build optimization, dig deeper into BuildKit (remote builders, distributed caching, etc.). Whichever direction you take, the foundation built across these 13 parts will serve as the background.


To review the Docker series from the beginning, go back to Part 1. Reading from the start about why containers exist — with the pieces you have now built up — might reveal how they all fit together in a new light.


Related Posts

Share this post on:

Comments

Loading comments...


Previous Post
Docker Part 12 — Production Best Practices
Next Post
Kubernetes Beginner Series 1 — What Is Kubernetes