This is a small intro to what a container(e.g. docker container) is using under the surface to achieve process isolation.

  • changing the root file system using chroot
  • using namespaces to create a separate view of the system resources for the process - process ID:s, mount points, networks, users, etc.
  • using control groups to restrict resources - CPU, memory, network traffic.

The root file system

To start with, a container has it’s own file system, so we need to download a root file system. Or create one.

A root file system to try out

I chose to download the root file system used for CentOS 7 docker containers which can be found here https://github.com/CentOS/sig-cloud-instance-images/raw/CentOS-7.8.2003-x86_64/docker/centos-7.8.2003-x86_64-docker.tar.xz.

  1. Download
wget https://github.com/CentOS/sig-cloud-instance-images/raw/CentOS-7.8.2003-x86_64/docker/centos-7.8.2003-x86_64-docker.tar.xz
  1. Extract root files system to directory named rootfs
mkdir rootfs && tar -xvf centos-7.8.2003-x86_64-docker.tar.xz -C rootfs
  1. Mount required mount points into our root file system and copy the resolve file so we get DNS working inside our container
sudo mount --bind /dev rootfs/dev
sudo mount --bind /proc rootfs/proc
cp /etc/resolv.conf rootfs/etc/

For more info on the file systems we are mounting above:

“The proc file system acts as an interface to internal data structures in the kernel. It can be used to obtain information about the system and to change certain kernel parameters at runtime (sysctl).” (https://www.kernel.org/doc/html/latest/filesystems/proc.html)

“/dev is the location of special or device files” (https://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/dev.html)

  1. Use chroot to make it our new “working” file system
sudo chroot rootfs/ /bin/bash

Try it out issuing a yum search:

yum search telnet

If we run ps:

ps -ef 

we note that host processes are visible inside the container, this is undesirable since one of the points of running a container is to isolate the process. Enter namespaces.

Namespaces

Namespaces are used to further isolate our process. There are some different types of namespaces, some examples:

User namespace - has it’s own set of user and group IDs.
Network namespace - has it’s own network stack (i.e set of ip addresses, socket listing and more).
Mount namespace - independent list of mount points seen by it’s processes.
Process ID (PID) namespace - has it’s own set of PIDs that processes use.
IPC namespace - IPC resources.
UNIX Time-Sharing namespace - Isolates hostname.

An example of using a PID namespace:

The command we use to create a namespace is unshare. Try it out by running this command to start our container:

sudo unshare -f -p --mount-proc=rootfs/proc chroot rootfs

If we run ps again, we should not see host processes:

[root@DESKTOP-J631UAD /]# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 23:16 ?        00:00:00 /bin/bash -i
root          10       1  0 23:36 ?        00:00:00 ps -ef

CGroups

CGroups are used to restrict resources like memory and CPU for specific processes. Here is an example of restricting memory of our containers shell.

To create a cgroup we create a directory inside /sys/fs/cgroup/

sudo mkdir /sys/fs/cgroup/container_cgroup

This should then be auto populated, check it out:

ls /sys/fs/cgroup/container_cgroup

Now we need to add our container process to our cgroup - do the following (outside the container chroot when the container is running):

  1. Find PID of the unshare process
ps -ef | grep unshare

Take note of the unshare pid (the one without sudo in front).

  1. Find PID of our shell, by looking for child processes to the unshare pid
ps -ef | grep $PID_OF_UNSHARE

You should see one bash PID running as a child to the unshare PID. Take note.

  1. Echo the PID of our shell into the cgroup.procs file
sudo echo $PID_OF_SHELL > /sys/fs/cgroup/container_cgroup/cgroup.procs

DONE.

Add a memory limit of 10MB to our cgroup:

sudo echo "10000000" > /sys/cgroup/container_cgroup/memory.max

Now test it out by running a process inside the container that eats more memory then specified above.