Core
To a Senior Backend Engineer, "Docker" is often a misnomer. We usually say "Docker" when we actually mean a specific stack of standardized components (OCI) interacting with Linux kernel primitives.
At its core, Docker is not a virtualization technology; it is a process isolation technology. It is a user-space tool that manipulates kernel features to trick a process into thinking it has its own dedicated machine.
Here is the deep dive into Docker Core Theory, broken down by Architecture, Kernel Primitives, and Storage.
1. The Architecture: It's Not Monolithic
Years ago, Docker was a single binary (docker) that did everything. Today, it is a modular stack. When you run docker run, a relay race occurs between distinct components.
The Stack
Docker Client (CLI): The user interface. It converts your commands into REST API calls sent to the daemon.
Dockerd (The Daemon): The API server. It manages higher-level objects like images, networks, and volumes. It does not run the containers. It delegates that to
containerd.Containerd: The industry-standard container supervisor (CNCF graduated). It manages the container lifecycle (pulling images, storage, networking). When it needs to run a container, it calls
runc.Runc: The low-level OCI Runtime. This is the binary that actually talks to the Kernel. It sets up the namespaces and cgroups, spawns the process, and then exits.
Containerd-Shim: A small piece of code that sits between
containerdand the container.Why it exists: It allows
runcto exit (keeping the runtime lightweight) and allowscontainerd(anddockerd) to be restarted/upgraded without killing running containers. It also handles the container'sstdout/stderr.
The "Life of a Request"
You type
docker run nginx.Client POSTs to Dockerd.
Dockerd calls Containerd ("Get me an nginx container").
Containerd pulls the image and converts it into an OCI Bundle (filesystem + config).
Containerd calls Runc.
Runc interacts with the Kernel to create the isolated environment.
Runc starts the nginx process and exits.
Shim takes over ownership of the process.
2. The Kernel Primitives: The "Magic"
Docker essentially relies on two Linux Kernel features to create the illusion of a container.
A. Namespaces (Isolation)
Namespaces limit what a process can see. They partition kernel resources such that one set of processes sees one set of resources, while another set sees a different set.
PID Namespace: The container has its own Process ID 1. Inside,
nginxmight be PID 1. On the host, it might be PID 14302.MNT Namespace (Mount): The container has its own root filesystem (
/). It cannot see the host's/homeor/varunless explicitly mounted.NET Namespace: The container gets its own network stack (IP, localhost, port range).
UTS Namespace: Allows the container to have its own hostname.
IPC Namespace: Prevents shared memory access between the container and host/other containers.
USER Namespace: (Optional but powerful) Maps
rootinside the container to a non-privileged user on the host.
B. Cgroups (Control Groups)
Cgroups limit how much a process can use. While Namespaces hide the neighbors, Cgroups build the walls.
Resource Limiting: Restrict CPU usage (shares/quotas) and Memory usage (OOM Killer triggers).
Prioritization: Give critical containers more CPU time during contention.
Accounting: Measure exactly how much resource a container consumes.
3. The Filesystem: Layers & UnionFS
This is often the most misunderstood part of Docker theory. Docker images are not single files; they are a stack of read-only layers (tarballs).
Union File System (OverlayFS)
Docker uses a Union File System (usually overlay2 on modern Linux) to combine these layers into a single view.
LowerDir: The read-only image layers (e.g., Ubuntu Base -> Apt install Python -> Add App Code).
UpperDir: A thin, read-write layer created when the container starts.
MergedDir: The unified view the application sees.
Copy-on-Write (CoW)
This mechanism is why Docker containers start instantly and take up very little space.
Read: If your app reads
config.inilocated in a lower (image) layer, it reads it directly from there. Efficiency is high.Write: If your app tries to modify
config.ini, the kernel traps the write. It copies the file from the read-only LowerDir to the read-write UpperDir first.Result: The container now sees its own copy. The original image remains untouched for other containers to use.
4. Networking: The Bridge
How does a process inside a confined NET Namespace talk to the internet?
Veth Pairs (Virtual Ethernet): Docker creates a virtual cable. One end (
eth0) sits inside the container. The other end (vethXXXX) sits on the host machine.The Bridge (
docker0): The host end of the cable is plugged into a virtual switch calleddocker0.NAT (Network Address Translation): To get out to the internet,
iptablesrules on the host Masquerade the traffic, making it look like it came from the host's physical IP.
Summary for Engineers
Containers are just processes. If you run
ps auxon the host, you can find the container's process.Images are just files. They are tarballs managed by a graph driver.
"Docker" is a manager. The real work is done by
runc(setup) and the Linux Kernel (execution).
What specific action would you like to take?
Would you like me to provide a practical guide on inspecting these primitives manually (e.g., finding a container's PID on the host and entering its namespace using nsenter without using Docker commands)?
... Deep Dive into Docker Internals ...
This video is relevant because it visualizes the exact breakdown of containerd, runc, and the kernel namespaces discussed above, helping to cement the mental model of the architecture.
Last updated