Docker Part 13 — Troubleshooting and Alternatives

The Big Picture of Diagnosis
Common Exit Code Interpretation
Reading Logs
inspect — All the Metadata
events — What Happened in Chronological Order
stats — Real-Time Resources
top — Processes Inside the Container
exec — Getting Inside the Container
Common Error Patterns and Solutions
Docker Alternatives
Migration Considerations
Wrapping Up the Docker Series

The Big Picture of Diagnosis

Here is a flow for deciding where to start when an incident occurs:

flowchart TB
    START["Container anomaly"] --> Q1{"Visible in<br/>docker ps?"}
    Q1 -->|No| NOEXIST["docker ps -a<br/>Check exit code"]
    Q1 -->|Yes| Q2{"State is<br/>Running?"}
    Q2 -->|Restarting| LOGS["docker logs<br/>+ events"]
    Q2 -->|Running| Q3{"Are requests<br/>getting through?"}
    Q3 -->|No| NET["inspect network<br/>+ check ports"]
    Q3 -->|Slow/errors| STATS["stats / top<br/>Check resources"]
    NOEXIST --> EXITC{"Exit Code"}
    EXITC -->|0| NORMAL["Normal exit"]
    EXITC -->|1| APP["App error"]
    EXITC -->|125| CLI["docker command error"]
    EXITC -->|126| PERM["Not executable (permission)"]
    EXITC -->|127| NOT["Command not found"]
    EXITC -->|137| OOM["SIGKILL (likely OOM)"]
    EXITC -->|143| TERM["SIGTERM (normal shutdown path)"]

Just looking at the exit code narrows down the cause by half. Let’s go through them.

Common Exit Code Interpretation

Exit code	Meaning	Common Cause
0	Normal exit	App finished as intended
1	Application-level error	Check stack trace
125	Docker command itself failed	Option/image name typo
126	Command inside container not executable	Permission denied, missing execute bit
127	Command not found	PATH issue, binary not present
137	Killed by SIGKILL	OOM killer or liveness failure
139	SIGSEGV	Native crash
143	Killed by SIGTERM	Orchestrator’s normal shutdown path

The most misunderstood is 137. It is easy to immediately conclude “it’s OOM,” but in reality, Kubernetes liveness failure where kubelet sends SIGKILL also produces 137. If there is no OOM message in logs and the liveness config is strict, suspect the probe first.

Verification commands:

docker ps -a --format 'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}'
docker inspect <container> --format '{{.State.ExitCode}} {{.State.OOMKilled}} {{.State.Error}}'

If OOMKilled: true, it is a confirmed memory exceeded.

Reading Logs

docker logs shows the container’s stdout/stderr as-is. Use -f to follow in real time.

docker logs <container>
docker logs -f <container>
docker logs --tail 200 <container>
docker logs --since 10m <container>
docker logs --timestamps <container>

With Compose, use the service name:

docker compose logs -f web
docker compose logs --tail 100 db

One important note: if the app only writes logs to a file and not stdout, docker logs shows nothing. You need to configure the app to output to standard out (Node: console.log, Python: sys.stdout, Spring Boot: remove logging.file.name config, etc.).

inspect — All the Metadata

Extracts detailed information as JSON for containers, images, networks, and volumes.

# Container state
docker inspect <container>

# Specific fields only
docker inspect <container> --format '{{.State.Status}}'
docker inspect <container> --format '{{.NetworkSettings.IPAddress}}'
docker inspect <container> --format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{"\n"}}{{end}}'

# Image
docker inspect <image>

# Network
docker network inspect bridge

Fields that quickly narrow down issues:

.State.Health.Status — HEALTHCHECK status
.State.Health.Log — Recent healthcheck results (e.g., last 5)
.Config.Env — Environment variables (a problem if secrets are exposed here)
.HostConfig.Memory, .HostConfig.NanoCpus — Actually applied resource limits
.NetworkSettings.Networks — Networks joined and IPs

events — What Happened in Chronological Order

docker events is Docker Daemon’s real-time event stream. Useful when you want to trace why a container died in chronological order.

# Real-time
docker events

# Filtered
docker events --filter 'event=die' --filter 'event=kill'

# Past events
docker events --since '1h' --until '10m'

Example output:

2026-04-20T12:34:56 container die 5e9a...(image=myapp:1.4.2, exitCode=137)
2026-04-20T12:34:56 container oom 5e9a...

If an oom event appears alongside, the OOM killer is confirmed.

stats — Real-Time Resources

stats shows per-container CPU/memory/network/IO in real time.

docker stats
docker stats <container> --no-stream

--no-stream prints once and exits. Useful for CI/monitoring scripts. If memory usage is approaching the limit, it is an OOM precursor.

top — Processes Inside the Container

docker top shows the process list inside a container.

docker top <container>
docker top <container> auxf     # Tree format

If PID 1 shows as sh -c ..., suspect the signal delivery issue discussed in Part 12. If there are more child processes than expected, also check for fork-related issues.

exec — Getting Inside the Container

Probably the most-used diagnostic command.

docker exec -it <container> sh
docker exec -it <container> bash     # For images that have bash
docker exec <container> cat /proc/1/status
docker exec <container> env

For distroless or other shell-less images, you cannot attach a shell via exec. In that case, spin up a separate network debug container and attach it to the same network namespace:

docker run -it --rm \
  --network container:<container> \
  --pid container:<container> \
  nicolaka/netshoot

nicolaka/netshoot is a debugging image packed with tools like curl, dig, tcpdump, and strace. Because it shares the network/PID namespace, you can inspect the target container’s ports and processes directly.

Common Error Patterns and Solutions

1. `permission denied` — Volume Permissions

open /app/logs/app.log: permission denied

Cause: A host directory is bind-mounted, but the non-root user UID inside the container does not match the host file owner UID.

Solution:

Match the container’s UID to the host directory owner (--user 1000:1000)
Or use a named volume (Docker handles permissions)
Or set RUN chown -R followed by USER during image build

2. `Error response from daemon: pull access denied`

pull access denied for registry.example.com/myapp

Cause: Not logged into the registry, token expired, or repository path typo.

Solution:

docker login registry.example.com
docker pull registry.example.com/myapp:1.4.2

For ECR, the token is 12-hour, so it must be refreshed at the start of each CI run.

3. `address already in use`

bind: address already in use

Cause: The port on the host is already being used by another process.

Solution:

lsof -i :8080     # Linux/macOS
# or
netstat -anp | grep 8080

# Find conflicting container
docker ps --filter "publish=8080"

4. `no space left on device`

Cause: The host disk is full, or Docker’s internal storage directory (/var/lib/docker) is full.

# Cleanup
docker system df                 # Check capacity
docker system prune              # Clean stopped containers, unused networks/images
docker system prune -a --volumes # Above + unreferenced images + volumes (caution!)
docker builder prune             # Clean build cache only

A CI job that periodically runs builder prune keeps things stable over time.

5. `CrashLoopBackOff` (Kubernetes)

The container exits immediately after starting. kubelet keeps restarting it while the backoff grows exponentially.

kubectl describe pod <pod>           # Events, last exit code
kubectl logs <pod> -c <container>
kubectl logs <pod> -c <container> --previous  # Previous instance logs

--previous is the decisive flag. The current container is already dead with empty logs, but the previous instance’s logs reveal why it died.

Docker Alternatives

Docker is the de facto standard, but it is not the only option. Understanding why alternatives exist enables informed choices for each situation.

flowchart TB
    subgraph OCI["OCI Standards"]
      OCI_SPEC["OCI Image Spec<br/>OCI Runtime Spec<br/>OCI Distribution Spec"]
    end

    subgraph TOOLS["High-level tools"]
      DOCKER["Docker (dockerd)"]
      PODMAN["Podman"]
      NERDCTL["nerdctl"]
    end

    subgraph RUNTIME["Low-level runtimes"]
      CONTAINERD["containerd"]
      CRIO["CRI-O"]
      RUNC["runc"]
    end

    DOCKER --> CONTAINERD
    NERDCTL --> CONTAINERD
    PODMAN --> RUNC
    CONTAINERD --> RUNC
    CRIO --> RUNC

    OCI_SPEC -.standard.-> DOCKER
    OCI_SPEC -.standard.-> PODMAN
    OCI_SPEC -.standard.-> CONTAINERD

The key takeaway from this diagram is that images and runtimes are standardized. Images built with Docker can run on Podman or containerd as-is. Even if you switch tools, images remain compatible.

Podman — Daemonless + Rootless

Podman is an alternative led by Red Hat. Command compatibility is high (alias docker=podman works in many cases), with two key differences:

No daemon: Docker has the always-running dockerd daemon, with the CLI calling it. Podman has each command directly starting containers. It is friendly with SystemD (podman generate systemd).
Rootless by default: Running podman as a regular user leverages user namespaces to run containers without root. This reduces the attack surface.

# Most Docker commands work as-is
podman pull alpine:3.20
podman run -it alpine:3.20 sh
podman ps
podman build -t myapp:1.4.2 .

# Auto-generate SystemD units
podman generate systemd --name myapp > myapp.service

However, Compose compatibility is less smooth. A separate podman-compose tool exists, and recently podman compose has been integrated, but it is not perfectly compatible.

When to choose Podman:

Security environments requiring rootless (government, finance, etc.)
When aligning with Red Hat-based OS standard tooling
When managing containers as services via SystemD

containerd — Kubernetes Standard Runtime

containerd is a low-level runtime originally separated from Docker. Docker uses containerd, and Kubernetes also uses containerd as its default CRI. Since Kubernetes 1.24 removed Docker Shim, node runtimes are mostly containerd or CRI-O.

# containerd uses nerdctl instead of its own CLI
nerdctl pull alpine:3.20
nerdctl run -it alpine:3.20 sh
nerdctl build -t myapp:1.4.2 .
nerdctl compose up -d

nerdctl provides an interface nearly identical to the Docker CLI. With BuildKit integration, Compose support, and even image encryption, it is essentially “Docker without the Docker CLI.”

When you encounter containerd:

When working with Kubernetes nodes (you are likely already using it)
When checking pods/containers at the node level with crictl

# On a Kubernetes node
crictl ps
crictl logs <container ID>
crictl exec -it <container ID> sh

CRI-O

CRI-O is a Kubernetes-only runtime. OpenShift uses it as the default. Its philosophy is to deliberately leave out features not needed by Kubernetes, making it simple and lightweight. Regular developers rarely use it directly — it falls under the platform team’s scope.

Migration Considerations

Main things to check when moving from Docker to Podman or containerd-based setups:

Compose compatibility: docker compose files may not run as-is on Podman. Latest features like depends_on.condition, healthcheck, and profiles vary by tool version
Docker socket dependency: If CI runners or observability tools depend on /var/run/docker.sock, you need to switch to Podman’s podman.sock or containerd’s containerd.sock
User namespace mapping: In rootless mode, UID mapping differs, potentially introducing new volume permission issues
Build commands: Just as docker build is BuildKit-based, Podman’s buildah and nerdctl’s BuildKit integration are similar but not perfectly compatible
Images are compatible: As shown in the diagram above, images follow the OCI standard and work across tools. This is not the biggest concern

Wrapping Up the Docker Series

This concludes the 13-part journey. Starting from what a container is in Part 1, through images/networking/volumes fundamentals, Dockerfile, Compose, security and optimization, to production operations. Each part builds the foundation for the next. The healthcheck used in Part 7 Compose extends to probes in Part 12, and the secrets from Part 10 security are implemented with --mount=type=secret in Part 11 BuildKit.

Docker is not just a single tool — it is the gateway to the container ecosystem. The instincts built here apply whether you go to Kubernetes, Podman, or containerd. Images and namespaces, control groups and runtimes — the same building blocks, just different packaging.

The next step from this series depends on your needs. If orchestration is needed, go to Kubernetes. If security is the focus, explore Podman and image signing. For build optimization, dig deeper into BuildKit (remote builders, distributed caching, etc.). Whichever direction you take, the foundation built across these 13 parts will serve as the background.

To review the Docker series from the beginning, go back to Part 1. Reading from the start about why containers exist — with the pieces you have now built up — might reveal how they all fit together in a new light.

Docker Part 13 — Troubleshooting and Alternatives

Table of contents

The Big Picture of Diagnosis

Common Exit Code Interpretation

Reading Logs

inspect — All the Metadata

events — What Happened in Chronological Order

stats — Real-Time Resources

top — Processes Inside the Container

exec — Getting Inside the Container

Common Error Patterns and Solutions

1. `permission denied` — Volume Permissions

2. `Error response from daemon: pull access denied`

3. `address already in use`

4. `no space left on device`

5. `CrashLoopBackOff` (Kubernetes)

Docker Alternatives

Podman — Daemonless + Rootless

containerd — Kubernetes Standard Runtime

CRI-O

Migration Considerations

Wrapping Up the Docker Series

Related Posts

Terraform Part 15 — Practical Patterns and Pitfalls

Terraform Part 14 — Testing and Policy

Terraform Part 13 — CI/CD Integration

Terraform Part 12 — Kubernetes and Helm Providers

Comments

Docker Part 13 — Troubleshooting and Alternatives

Table of contents

The Big Picture of Diagnosis

Common Exit Code Interpretation

Reading Logs

inspect — All the Metadata

events — What Happened in Chronological Order

stats — Real-Time Resources

top — Processes Inside the Container

exec — Getting Inside the Container

Common Error Patterns and Solutions

1. permission denied — Volume Permissions

2. Error response from daemon: pull access denied

3. address already in use

4. no space left on device

5. CrashLoopBackOff (Kubernetes)

Docker Alternatives

Podman — Daemonless + Rootless

containerd — Kubernetes Standard Runtime

CRI-O

Migration Considerations

Wrapping Up the Docker Series

Related Posts

Terraform Part 15 — Practical Patterns and Pitfalls

Terraform Part 14 — Testing and Policy

Terraform Part 13 — CI/CD Integration

Terraform Part 12 — Kubernetes and Helm Providers

Comments

1. `permission denied` — Volume Permissions

2. `Error response from daemon: pull access denied`

3. `address already in use`

4. `no space left on device`

5. `CrashLoopBackOff` (Kubernetes)