Table of contents
- A Container Is a Process
- State Transition Diagram
- docker run — The Most Commonly Typed Command
- A Container Is a One-Person Company with PID 1
- Handling SIGTERM in Your App
- start, stop, restart, rm — Commands That Move State
- docker exec — Getting Inside a Running Container
- logs and inspect — Looking Inside the State
- Restart Policy
- Tracking “Why It Died” via Exit Code
- Operational Flow at a Glance
- Five Principles to Always Keep in Mind
A Container Is a Process
An image is a still photo, and a container is a moving picture. A movie starts, pauses, and ends. Without understanding this lifecycle, you will encounter strange problems in operations: “I restarted the container but forgot -it and it immediately went to Exited,” “the server rebooted and the containers did not come back up,” “my deploy script keeps triggering SIGKILL.” All of these come from an insufficient understanding of the lifecycle.
In this part, we follow the entire journey of a container from creation to destruction.
State Transition Diagram
Let’s start with the state transitions at a glance:
stateDiagram-v2
[*] --> created : docker create
[*] --> running : docker run
created --> running : docker start
running --> paused : docker pause
paused --> running : docker unpause
running --> stopped : docker stop<br/>(SIGTERM → SIGKILL)
running --> stopped : Process exits normally
stopped --> running : docker start / restart
stopped --> [*] : docker rm
created --> [*] : docker rm
The key takeaways:
docker run=create+startcombined in one step- A container in the
stopped(Exited) state still exists on disk. It must be explicitly removed withdocker rm pauseonly freezes the process — it does not release resources. Mainly used for debugging
docker run — The Most Commonly Typed Command
This is the basic command for spinning up a container. It has many options, but here are the ones you will repeatedly use in practice:
docker run \
-d \
--name web \
--restart unless-stopped \
-p 8080:80 \
-e NODE_ENV=production \
-v $(pwd)/data:/data \
--memory 512m --cpus 0.5 \
nginx:1.27
This single line contains all the important options:
-d(--detach): Run in the background. Without it, the container’s logs take over the terminal--name: Give the container a human-readable name. Without it, a random name likepensive_diracis assigned--restart: Restart policy (detailed below)-p HOST:CONTAINER: Port forwarding. Maps host port 8080 to container port 80-e KEY=VALUE: Environment variable. Overrides the Dockerfile’sENV-v HOST:CONTAINER: Volume mount (Part 5)--memory,--cpus: Resource limits. Prevents a single container from hogging the host
For quick tests, you will also frequently use patterns like these:
# One-off interactive shell, automatically removed on exit
docker run --rm -it ubuntu:24.04 bash
# Share the host network (for debugging)
docker run --rm --network host nginx:1.27
--rm means “automatically delete when finished.” -it is a combination of -i (attach standard input) and -t (allocate a TTY), used when running interactively like a shell.
A Container Is a One-Person Company with PID 1
A container has its own isolated PID namespace, and the specified process runs as PID 1 inside it. In Linux, PID 1 is a special entity:
- It must adopt orphaned processes (collect zombies via
wait()) - Signals are not automatically forwarded. They must be explicitly received and passed to children
- Most default signal handlers are disabled. For example, PID 1 ignores
SIGTERMby default
If you run a shell script as PID 1 without knowing this, odd things happen. In particular, docker stop will not work and after a 10-second wait, the process gets killed with SIGKILL.
The Shell Form Trap of CMD
Let’s expand on what was briefly mentioned in Part 3. If you write this in your Dockerfile, it becomes a problem:
CMD node server.js # shell form
Internally, this is executed as /bin/sh -c "node server.js". The process tree looks like this:
PID 1: /bin/sh -c "node server.js"
└─ PID 7: node server.js
docker stop sends SIGTERM to PID 1. sh receives the signal and ignores it. The Node app never gets the signal, and after 10 seconds it is force-killed with SIGKILL. DB connections are severed and in-flight requests are lost.
Using exec form eliminates this problem:
CMD ["node", "server.js"] # exec form
PID 1: node server.js # Runs directly without a shell
Node.js has default signal handlers that respond to SIGTERM by exiting the event loop. Clean shutdown becomes possible.
Init processes like tini or dumb-init
Complex apps sometimes spawn multiple child processes. If PID 1 does not reap zombies, they accumulate over time. In such cases, a lightweight init process is used as PID 1.
# Docker provides a built-in init
# Can also be replaced with the docker run --init option
docker run --init myapp
The --init flag places tini as PID 1 and runs the application underneath it. Tini receives signals, forwards them to children, and reaps zombies. Placing it in front of complex scripts dramatically improves stability.
Handling SIGTERM in Your App
When Docker stops a container, it follows this sequence:
- Send SIGTERM to PID 1
- Wait 10 seconds by default (
--stop-timeoutto adjust) - If still alive, force-kill with SIGKILL
This “10-second grace period” is the key. The app must cleanly finish within this time. Here is an example of attaching a handler in Node.js:
// Graceful shutdown for an Express app
const server = app.listen(3000);
function shutdown(signal) {
console.log(`${signal} received, shutting down...`);
server.close(() => {
console.log('HTTP server closed');
// Clean up DB connections, queue consumers, etc.
process.exit(0);
});
// Failsafe timeout for forced exit
setTimeout(() => {
console.error('Forced exit');
process.exit(1);
}, 9000);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
server.close() stops accepting new connections and invokes the callback once all in-flight requests are done. The 9-second timer ensures the app finishes cleanly before Docker’s 10-second timeout kicks in.
Other languages like Go and Java follow the same pattern: receive signal → block new requests → complete in-flight requests → clean up resources → exit.
start, stop, restart, rm — Commands That Move State
These are the commands for manipulating a container once it has been created:
# Stop — SIGTERM → 10-second wait → SIGKILL
docker stop web
docker stop --time=30 web # Wait up to 30 seconds
# Start again
docker start web
# Stop then start (stop + start)
docker restart web
# Force immediate termination (SIGKILL)
docker kill web
docker kill --signal=SIGUSR1 web # Custom signals are also possible
# Remove (must be in stopped state)
docker rm web
# Force-remove a running container
docker rm -f web
Know the difference between docker stop and docker kill. stop gives a grace period before killing. kill terminates immediately. For routine redeployments, use stop. Use kill only when a process is stuck or unresponsive.
docker exec — Getting Inside a Running Container
Use this when you want to enter an already running container and execute commands. It is the most frequently used debugging command.
# Attach a shell
docker exec -it web bash
# If the image has no bash
docker exec -it web sh
# Run a command once
docker exec web printenv NODE_ENV
# Enter as root (when the image specifies a USER)
docker exec -u root -it web bash
Inside the container, you typically run network tests, check logs, and inspect disk status. Note that packages installed or files changed via exec survive container restarts but disappear when the container is recreated (deleted then re-run). Operational changes should be reflected in images or volumes.
logs and inspect — Looking Inside the State
To see why a container died or what is happening right now:
# View logs
docker logs web
docker logs -f web # Stream like tail -f
docker logs --tail 100 web # Last 100 lines
docker logs --since 10m web # Last 10 minutes
docker logs -t web # Include timestamps
# Full container metadata (JSON)
docker inspect web
# Extract specific fields
docker inspect --format '{{.State.Status}}' web
docker inspect --format '{{.State.ExitCode}}' web
docker inspect --format '{{.RestartCount}}' web
The JSON output from docker inspect is massive, so using --format to extract only the fields you need is the practical approach. State.Status, State.ExitCode, and RestartCount are commonly checked in operations.
To view container resource usage in real time:
docker stats
# CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
It is a tool with a similar feel to top.
Restart Policy
This option determines whether a container is automatically restarted when it exits. Specified with the --restart flag.
| Policy | Behavior |
|---|---|
no (default) | No automatic restart |
on-failure[:N] | Restart only on non-zero exit codes. N is the max retry count |
always | Restart regardless of reason. Also auto-starts when Docker Daemon starts |
unless-stopped | Similar to always, but does not restart if the user explicitly ran docker stop |
docker run -d --restart unless-stopped --name web nginx:1.27
The most commonly used in practice is unless-stopped. It auto-recovers from abnormal exits but stays stopped when you intentionally stop it. It also comes back up automatically after a server reboot.
Using on-failure:3 means the app will restart up to 3 times when it crashes with an error. If it still fails after the third try, it gives up. This prevents a crash loop from running indefinitely and burning resources.
If you are using Docker Compose or Kubernetes, restart policies are typically left to the orchestrator. Docker Engine-level restart policies are mainly for single-node operations.
Tracking “Why It Died” via Exit Code
When a container exits, an exit code is left behind. This number is the first clue for diagnosing the cause.
docker ps -a --filter "name=web" --format "{{.Names}}\t{{.Status}}"
# web Exited (137) 2 seconds ago
| Exit Code | Meaning |
|---|---|
| 0 | Normal exit |
| 1 | General error (app called exit(1) or threw an exception) |
| 125 | docker run itself failed (e.g., image not found) |
| 126 | Command in container is not executable (e.g., permission denied) |
| 127 | Command in container not found (not in PATH) |
| 137 | Killed by SIGKILL (9 + 128). docker kill or OOM |
| 139 | SIGSEGV segfault (11 + 128) |
| 143 | Killed by SIGTERM (15 + 128). Normal path of docker stop |
137 is the most commonly encountered mystery. Usually it means the OOM Killer terminated the container for exceeding the memory limit, someone ran docker kill, or Kubernetes killed it due to a failed livenessProbe. Check dmesg or use docker inspect --format '{{.State.OOMKilled}}' web to confirm OOM.
Operational Flow at a Glance
Collecting the frequently encountered flows in practice into a single diagram looks like this:
flowchart TB
BUILD["docker build -t app:1.2.3 ."] --> PUSH["docker push registry/app:1.2.3"]
PUSH --> PULL["docker pull on server"]
PULL --> STOP["docker stop old-app (graceful)"]
STOP --> RM["docker rm old-app"]
RM --> RUN["docker run -d --name app --restart unless-stopped ..."]
RUN --> HEALTH{"HEALTHCHECK OK?"}
HEALTH -->|Yes| DONE["Deployment complete"]
HEALTH -->|No| LOGS["docker logs app<br/>Investigate cause"]
LOGS --> ROLLBACK["Rollback: docker run with previous tag"]
On a single server, this flow can be wrapped up in a single shell script. When servers scale to multiple machines, you move to Compose, Swarm, or Kubernetes. Even then, the lifecycle of each individual container still follows these rules.
Five Principles to Always Keep in Mind
Finally, principles you should not forget in practice:
- Be mindful of PID 1. Write
CMDin exec form, and use--initor tini when needed - Catch SIGTERM in your app. Zero-downtime deployment is impossible without graceful shutdown
unless-stoppedis a safe default for restart policies. Adjust to on-failure as needed- Trace causes via exit codes and logs. If you see 137, suspect OOM first
- Even Exited containers consume disk. Periodically clean up with
docker container prune
In the next part, we talk about data that must survive even when a container dies. The difference between bind mounts and named volumes, the role of tmpfs, volume backup and restore, and the surprisingly common UID/GID permission issues.

Loading comments...