Skip to content

Latest commit

 

History

History
120 lines (64 loc) · 7.27 KB

33. Container Building Blocks.md

File metadata and controls

120 lines (64 loc) · 7.27 KB

Building Blocks of Containers

There are three building blocks of containers: namespaces, cgroups and union filesystem.

Namespaces provide isolation of system resources, and cgroups allow for fine‑grained control and enforcement of limits for those resources. They are features of the Linux kernel (as below).

image

Union filesystem allows files and directories of separate file systems to be transparently overlaid.

Namespaces

Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources.

In other words, to isolate one process from another.

The container is nothing but a process for the kernel, so we isolate each container using different namespaces.

If the developer starts two containers, there are two processes running on a single server somewhere – but they are isolated from each other.

Types of Namespaces

System resources that can be virtualized are: users [User IDs], process ID [PID], network [net], mount [mnt], Interprocess Communication [IPC], hostnames [UTS].

Accordingly, namespaces are categorized as below:

  • A user namespace has its own set of user IDs and group IDs for assignment to processes. In particular, this means that a process can have root privilege within its user namespace without having it in other user namespaces.

  • A process ID (PID) namespace assigns a set of PIDs to processes that are independent from the set of PIDs in other namespaces. The first process created in a new namespace has PID 1 and child processes are assigned subsequent PIDs. If a child process is created with its own PID namespace, it has PID 1 in that namespace as well as its PID in the parent process’ namespace.

  • A network namespace has an independent network stack: its own private routing table, set of IP addresses, socket listing, connection tracking table, firewall, and other network‑related resources.

  • A mount namespace has an independent list of mount points seen by the processes in the namespace. This means that you can mount and unmount filesystems in a mount namespace without affecting the host filesystem.

  • An interprocess communication (IPC) namespace has its own IPC resources, for example POSIX message queues.

  • A UNIX Time‑Sharing (UTS) namespace allows a single system to appear to have different host and domain names to different processes.

Example of process ID (PID) namespace

In the diagram below, there are three PID namespaces – a parent namespace and two child namespaces. Within the parent namespace, there are four processes, named PID1 through PID4. These are normal processes which can all see each other and share resources.

The processes PID2 and PID3 in the parent namespace also belong to their own PID namespaces in which their PID is 1. From within a child namespace, the PID1 process cannot see anything outside. For example, PID1 in both child namespaces cannot see PID4 in the parent namespace.

This provides isolation between (in this case) processes within different namespaces.

image

Namespaces and Containers

Namespaces are one of the technologies that containers are built on, used to enforce segregation of resources.

Container runtimes like Docker make things easier by creating namespaces on your behalf.

Control Groups (cgroups)

A control group (cgroup) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, and so on) of a collection of processes.

Cgroups provide the following features:

  • Resource limits – You can configure a cgroup to limit how much of a particular resource (memory or CPU, for example) a process can use.
  • Prioritization – You can control how much of a resource (CPU, disk, or network) a process can use compared to processes in another cgroup when there is resource contention.
  • Accounting – Resource limits are monitored and reported at the cgroup level.
  • Control – You can change the status (frozen, stopped, or restarted) of all processes in a cgroup with a single command.

Cgroups are a key component of containers because there are often multiple processes running in a container that you need to control together.

image

Union Filesystem

Union mount creates an illusion of merging contents of several directories into one without modifying its original (physical) sources.

This can be useful as we might have related sets of files stored in different locations or media, and yet we want to show them in single, merged view.

Union mount or union filesystem is; however, not the filesystem type, but rather a concept with many implementations.

  • UnionFS, the original union filesystem
  • OverlayFS, the filesystem used by default overlay2 Docker storage driver
    % docker system info | grep Storage
    Storage Driver: overlay2
    

Union Filesystem and Docker

A Docker image is an immutable file that contains the source code, libraries, dependencies, tools, and other files needed for an application to run.

Due to their read-only quality, these images are sometimes referred to as snapshots. They represent an application and its virtual environment at a specific point in time.

Since images are, in a way, just templates, you cannot start or run them. What you can do is use that template as a base to build a container.

A container is, ultimately, just a running image. Once you create a container, it adds a writable layer on top of the immutable image, meaning you can now modify it.

You can create an unlimited number of Docker images from one image base. Each time you change the initial state of an image and save the existing state, you create a new template with an additional layer on top of it.

image

Many images that we use to spin up our containers are quite bulky whether it's ubuntu with size of 72MB or nginx with size of 133MB.

It would be quite expensive to allocate that much space every time we'd like to create a container from these images.

Thanks to union filesystem, Docker only needs to create thin layer on top of the image and rest of it can be shared between all the containers. This also provides the added benefit of reduced start time, as there's no need to copy the image files and data.

Union filesystem also provides isolation, because containers have read-only access to the shared image layers. If they ever need to modify any of the read-only shared files, they use copy-on-write strategy to copy the content up to their top writable layer where it can be safely modified.

*References

https://www.nginx.com/blog/what-are-namespaces-cgroups-how-do-they-work/#pid-namespaces

https://en.wikipedia.org/wiki/Kernel_(operating_system)

https://www.linuxfoundation.org/blog/building-blocks-of-containers/

https://martinheinz.dev/blog/44

https://en.wikipedia.org/wiki/UnionFS

https://phoenixnap.com/kb/docker-image-vs-container