Container Virtualization
Review: Hardware Virtualization
-
Start with a physical machine
-
Create software (hypervisor) responsible for isolating the guest OS inside the VM
-
VM resources (memory, disk, networking, etc.) are provided by the physical machine but visibility outside of the VM is limited
-
VM and physical machine share same instruction set, so must the host and guest
-
Guest OS can provide a different application binary interface (ABI) inside the VM
-
Lots of challenges in getting this to work because guest OS expects to have privileged hardware access
Operating System Virtualization
-
Start with a
-
Create software responsible for isolating
inside the container-
(That software seems to lack a canonical name—and today it’s actually a bunch of different tools.)
-
-
Container resources
are provided by the real operating system but visibility outside the container is limited
-
Container and real OS share same kernel
-
So applications inside and outside the kernel must share the same ABI
-
Challenges is getting this to work are due to shared OS namespaces
Containers v. VMs
-
False. Container shares the kernel with the host.
-
True. All long as both distributions use the same kernel, differences are confined to different binary tools and file locations.
ps
inside the container will show all processes.-
False. Container process namespaces is isolated from the host.
Why Virtualize an OS?
Shares many (but not all) of the benefits of hardware virtualization with much lower overhead.
-
Cannot run multiple operating systems on the same machine.
-
Can transfer software setups to another machine as long as it has a identical or nearly identical hardware kernel.
-
Can adjust hardware container resources to system needs.
-
Container should not leak information inside and outside the container
-
Can isolate all of the configuration and software packages a particular application needs to run
OS v. Hardware Overhead
-
Application inside the VM makes a system call
-
Trap to the host OS (or hypervisor)
-
Hand trap back to the guest OS
-
Application inside the container makes a system call
-
Trap to the OS
-
Remember all of the work we had to do to deprivilege the guest OS and deal with uncooperative machine architectures like x86?
-
OS virtualization does not require any of this: there is only one OS!
OS Virtualization is About Names
-
Process IDs
-
top
inside the container shows only processes running inside container -
top
outside the container may show processes inside the container, but with different process IDs
-
-
File names
-
Processes inside the container may have a limited or different view of the mounted file system
-
File names may resolve to different names—and some file names outside the container may be removed
-
-
User names:
-
Containers may have different users with different roles
-
root
inside the container should not be root outside the container
-
-
Host name and IP address
-
Processes inside the container may use a different host name and IP address when performing network operations
-
OS Virtualization is About Control
The OS may want to ensure that the entire container—or everything that runs inside it—cannot consume more than a certain amount of:
-
CPU time
-
memory
-
disk or network bandwidth
Not a New Idea
Forms of OS virtualization go back to chroot
from 1982:
chroot - run command or interactive shell with special root directory
-
Instead of starting path resolution at inode #2, start somewhere else.
Modern container management systems like Docker combine and build upon multiple lower-levels tools and services.
Linux namespaces
Since 2002 Linux has provided namespace separation for a variety of resources that typically had unified namespaces
-
Mount points: allows different namespaces to see different views of the file system
-
Process IDs: new processes are allocated IDs in their current namespace and all parent namespaces
-
Network: namespaces can have private IP addresses and their own routing tables, and can communicate with other namespaces through virtual interfaces
-
Devices: devices can be present or hidden in different namespaces
cgroups
…a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.
-
Processes and their children remain in the same
cgroup
-
cgroups
may it possible to control the resources allocated to a set of processes
UnionFS
A stackable unification file system.
-
Does
/foo/bar
exist in the top layer? If yes, return its contents. -
Does
/foo/bar
exist in the next layer? If yes, return its contents. -
Etc.
-
Does
/foo/bar
exist in the top layer? If yes, return its contents. -
Access to
/foo
in the next layer is prohibited, so stop. (Even if/foo/bar
exists.
COW File System
Previous container libraries made a copy of the parent’s entire file system. (Containers need a lot of it.)
-
Copy on write!
-
Only make modifications to the underlying file system when the container modifies files.
-
Speeds start up and reduces storage usage.
-
The container mainly needs read-only access to host files.
-
What is Docker?
-
Provides a unified set of tools for container management on a variety of systems
-
Layered file system images for easy updates
-
Now involved in development of containerization libraries on Linux
Example Dockerfile
FROM komljen/ubuntu
MAINTAINER Alen Komljen <[email protected]>
ENV MONGO_VERSION 2.6.6
RUN \
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10 && \
echo "deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist
10gen" \
> /etc/apt/sources.list.d/mongodb.list && \
apt-get update && \
apt-get -y install \
mongodb-org=${MONGO_VERSION} && \
rm -rf /var/lib/apt/lists/*
VOLUME ["/data/db"]
RUN rm /usr/sbin/policy-rc.d
CMD ["/usr/bin/mongod"]
EXPOSE 27017