Table of contents
- The Security Big Picture
- 1. Minimal Base and Update Management
- 2. Non-Root Execution — the USER Directive
- 3. Image Scanning — Trivy
- 4. SBOM — Supply Chain Visibility
- 5. Secret Management — Build and Runtime Pitfalls
- 6. Read-Only Root Filesystem
- 7. Linux Capabilities — Keep Only What’s Needed
- 8. Full Privilege Escalation Defense Configuration
- 9. Network/Mount Restrictions
- 10. Image Signing and Verification — cosign
- Practical Checklist
- Where We Stand
The Security Big Picture
Let’s first see what we are protecting and from what:
flowchart TB
subgraph BUILD["Build time"]
B1["Minimal base image"]
B2["Non-root USER"]
B3["Build secrets"]
B4["Layer cleanup"]
end
subgraph DIST["Distribution time"]
D1["Image scan (Trivy)"]
D2["Signing (cosign)"]
D3["SBOM"]
end
subgraph RUN["Runtime"]
R1["Read-only FS"]
R2["Capabilities drop"]
R3["Resource limits"]
R4["Network/mount restrictions"]
R5["Secret injection"]
end
BUILD --> DIST --> RUN
Attacks break through one point and spread via lateral movement. Placing barriers at each of the build-distribution-runtime stages means that even if one stage is breached, the next stage stops the spread.
1. Minimal Base and Update Management
A larger base means more vulnerabilities. The multi-stage + distroless/slim combination from Part 8 is also a security win.
# Pattern to avoid — full of dependencies and tools
FROM ubuntu:22.04
# Recommended — only what's needed
FROM gcr.io/distroless/static-debian12:nonroot
Another important point is “periodically updating the base image.” An image built last year on alpine:3.18 has accumulated outdated packages. Either run scheduled rebuilds in CI, or pin only to the minor version (alpine:3.20) instead of the exact patch to absorb security patches.
2. Non-Root Execution — the USER Directive
By default, containers run as root (UID 0). If there is no reason to be root, don’t be. This breaks the path from a container escape vulnerability to host-level privileges.
FROM alpine:3.20
RUN addgroup -S app && adduser -S -G app -u 10001 app
WORKDIR /app
COPY --chown=app:app . .
USER app
ENTRYPOINT ["/app/server"]
A few points to note:
- Explicitly specify the UID (e.g., 10001). Relying on default auto-assignment can cause permission conflicts with host volumes
- Use
--chownto set file ownership. Without it, root-owned files cannot be read by non-root users - Place
USERafter the lastRUNcommands that need root privileges (like package installation)
In Kubernetes, you can reinforce this at the Pod spec level:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
Even if the image has USER set, adding runAsNonRoot: true to the Pod spec means kubelet will refuse to start the container if the image accidentally switches to root.
3. Image Scanning — Trivy
You will not know what CVEs are in your built image unless you check. Trivy is the most widely used open-source container scanner.
# Install (macOS)
brew install trivy
# Scan a local image
trivy image myapp:1.4.2
# Filter by severity, only fixable
trivy image --severity HIGH,CRITICAL --ignore-unfixed myapp:1.4.2
# Fail CI if vulnerabilities are found
trivy image --exit-code 1 --severity CRITICAL myapp:1.4.2
The baseline setup is to wire it into CI so the build breaks on CRITICAL findings. Harbor and GitHub Actions have built-in Trivy integration, enabling automatic scanning on push.
Alternatives include Snyk, Grype, and Clair. The key is simply “not skipping the scan.”
4. SBOM — Supply Chain Visibility
SBOM (Software Bill of Materials) is a list of libraries and packages inside an image. When a vulnerability is disclosed, you need an SBOM to instantly answer “are we using this library?”
docker buildx build --sbom=true -t myapp:1.4.2 .
# View SBOM
docker buildx imagetools inspect myapp:1.4.2 --format '{{json .SBOM}}'
# Generate separately with syft (SPDX, CycloneDX formats)
syft myapp:1.4.2 -o spdx-json > sbom.json
The pattern of attaching SBOMs alongside images in registries is becoming standardized (OCI referrers API). In regulated industries, audit requirements for this are likely coming soon.
5. Secret Management — Build and Runtime Pitfalls
The most common mistake is embedding passwords or API keys in the Dockerfile.
# NEVER do this
ARG DB_PASSWORD=mysecret
ENV DB_PASSWORD=$DB_PASSWORD
RUN curl -u admin:mysecret https://internal/...
ARG and ENV are embedded as-is in image layers. Anyone can see them via docker history. You need to understand that build-time and runtime secret handling are completely different.
Build Time — --mount=type=secret
BuildKit provides a feature that mounts files only during the build and does not leave them in the final image.
# syntax=docker/dockerfile:1.7
FROM alpine:3.20
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc \
npm install
DOCKER_BUILDKIT=1 docker build \
--secret id=npmrc,src=$HOME/.npmrc \
-t myapp:1.4.2 .
.npmrc is mounted at /root/.npmrc during the build, but the layer is not included in the image. This is covered in more detail in Part 11 with BuildKit.
Runtime — Environment Variables vs. Secret Stores
For local or Compose development environments, environment variables are sufficient:
docker run -e DB_PASSWORD="$DB_PASSWORD" myapp:1.4.2
In production, even this is risky. It is exposed via docker inspect and can accidentally appear in logs. For Kubernetes, use Secret resources (+ External Secrets Operator for Vault/SSM integration). For Swarm, use docker secret create.
# Docker Swarm secrets
echo "mysecret" | docker secret create db_password -
docker service create \
--secret db_password \
--name myapp myapp:1.4.2
# Inside the container, read from /run/secrets/db_password
The principle is “inject as a file, not an environment variable; mount as read-only; access with a least-privilege account.”
6. Read-Only Root Filesystem
Most apps do not need to modify the filesystem at runtime. Mount only the directories that need writing (/tmp, log directories, etc.) separately.
docker run --read-only \
--tmpfs /tmp:rw,size=64m \
-v app-logs:/var/log/app \
myapp:1.4.2
--read-only makes the root FS read-only. Attempts by an attacker to drop a malicious binary or modify /etc/passwd inside the container are blocked at the source. In Kubernetes, readOnlyRootFilesystem: true achieves the same effect.
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
Some apps do not work with this in practice. Certain frameworks require writing to paths like /tmp or /var/cache. Just open those specific paths with tmpfs or emptyDir.
7. Linux Capabilities — Keep Only What’s Needed
Root privileges are not an all-or-nothing deal. Containers start with about 14 capabilities by default. Most apps do not need more than half of them.
# Drop all and add back only what's needed
docker run \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
myapp:1.4.2
NET_BIND_SERVICE is needed for binding to ports below 1024. If the app does not use low ports like 80/443, drop this too. Bind the web server to a high port like 8080 and put a proxy in front, and --cap-drop=ALL alone is sufficient.
In Kubernetes, this is expressed as:
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
add: ["NET_BIND_SERVICE"]
allowPrivilegeEscalation: false is also important. It blocks privilege escalation through setuid binaries.
8. Full Privilege Escalation Defense Configuration
Gathering everything covered so far into a single Compose file, a security-hardened service block looks like this:
services:
web:
image: myapp:1.4.2
read_only: true
tmpfs:
- /tmp:size=64m
user: "10001:10001"
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
security_opt:
- "no-new-privileges:true"
environment:
DB_HOST: db
secrets:
- db_password
secrets:
db_password:
file: ./secrets/db_password.txt
Each line closes one attack vector. These are all just inversions of the defaults — but leaving the defaults unchanged lets risk accumulate.
9. Network/Mount Restrictions
--privileged is essentially host-level privileges. Mounting the Docker socket (/var/run/docker.sock) into a container lets the container freely control other containers on the host. Do not use these unless there is a legitimate reason (CI runner, host-level observation).
# Patterns to avoid
docker run --privileged ...
docker run -v /var/run/docker.sock:/var/run/docker.sock ...
docker run --net=host ...
If you must use them, build that container only from trusted images and restrict network exposure to internal only.
10. Image Signing and Verification — cosign
To verify that an image truly came from your CI and was not tampered with along the way, you need signing. cosign is the standard tool in the Sigstore ecosystem.
# Generate key pair
cosign generate-key-pair
# Sign
cosign sign --key cosign.key harbor.example.com/team/myapp:1.4.2
# Verify
cosign verify --key cosign.pub harbor.example.com/team/myapp:1.4.2
On the Kubernetes side, using an admission controller (Kyverno, Gatekeeper, sigstore-policy-controller) to enforce “reject unsigned images” policies blocks a major supply chain attack vector.
Practical Checklist
For every new image and service, verify at minimum:
- Does the Dockerfile have a
USERdirective? (non-root) - Is the base image slim/distroless/alpine?
- Does
.dockerignorekeep.gitand local secrets out of the image? - Is Trivy scanning running in CI with gating on CRITICAL?
- Are secrets not hardcoded in
ARG/ENV? - Can
--read-onlyand--cap-drop=ALLbe applied at runtime? - Does the Kubernetes Pod spec have
runAsNonRoot,readOnlyRootFilesystem, andallowPrivilegeEscalation: false?
This all looks tedious, but once you bake it into a project template, there is no need for repetitive setup on each new service.
Where We Stand
We have covered concrete methods for hardening containers across the build-distribution-runtime pipeline. Next, we move to the tool that improves build speed and flexibility.
In the next part, we cover BuildKit and advanced build features. Cache mounts, multi-architecture builds (buildx), build secrets, and parallel builds — the core tools for shortening build times and streamlining CI/CD pipelines.



Loading comments...