Today
-
OS virtualization and containers
ASST3.3
Checkpoint
-
If you have not started, you’re way, way behind.
-
Get started!
ASST3.3
is due Friday 5/5. Good luck finishing!
Congrats on Finishing ASST3
Review: Hardware Virtualization
-
Start with a physical machine
-
Create software (hypervisor) responsible for isolating the guest OS inside the VM
-
VM resources (memory, disk, networking, etc.) are provided by the physical machine but visibility outside of the VM is limited
-
VM and physical machine share same instruction set, so must the host and guest
-
Guest OS can provide a different application binary interface (ABI) inside the VM
-
Lots of challenges in getting this to work because guest OS expects to have privileged hardware access
Operating System Virtualization
-
Start with a real operating system
-
Create software responsible for isolating guest software inside the container
-
(That software seems to lack a canonical name—and today it’s actually a bunch of different tools.)
-
-
Container resources (processes, files, network sockets, etc.) are provided by the real operating system but visibility outside the container is limited
-
Container and real OS share same kernel
-
So applications inside and outside the kernel must share the same ABI
-
Challenges is getting this to work are due to shared OS namespaces
Containers v. VMs
-
False. Container shares the kernel with the host.
-
True. All long as both distributions use the same kernel, differences are confined to different binary tools and file locations.
ps
inside the container will show all processes.-
False. Container process namespaces is isolated from the host.
Hypervisor v. Container Virtualization
Why Virtualize an OS?
Shares many (but not all) of the benefits of hardware virtualization with much lower overhead.
-
Cannot run multiple operating systems on the same machine.
-
Can transfer software setups to another machine as long as it has a identical or nearly identical hardware kernel.
-
Can adjust hardware container resources to system needs.
-
Container should not leak information inside and outside the container
-
Can isolate all of the configuration and software packages a particular application needs to run
OS v. Hardware Overhead
-
Application inside the VM makes a system call
-
Trap to the host OS (or hypervisor)
-
Hand trap back to the guest OS
-
Application inside the container makes a system call
-
Trap to the OS
-
Remember all of the work we had to do to deprivilege the guest OS and deal with uncooperative machine architectures like x86?
-
OS virtualization does not require any of this: there is only one OS!
OS Virtualization is About Names
-
Process IDs
-
top
inside the container shows only processes running inside container -
top
outside the container may show processes inside the container, but with different process IDs
-
-
File names
-
Processes inside the container may have a limited or different view of the mounted file system
-
File names may resolve to different names—and some file names outside the container may be removed
-
-
User names:
-
Containers may have different users with different roles
-
root
inside the container should not be root outside the container
-
-
Host name and IP address
-
Processes inside the container may use a different host name and IP address when performing network operations
-
OS Virtualization is About Control
The OS may want to ensure that the entire container—or everything that runs inside it—cannot consume more than a certain amount of:
-
CPU time
-
memory
-
disk or network bandwidth
Not a New Idea
Forms of OS virtualization go back to chroot
from 1982:
chroot - run command or interactive shell with special root directory
-
Instead of starting path resolution at inode #2, start somewhere else.
Modern container management systems like Docker combine and build upon multiple lower-levels tools and services.
Linux namespaces
Since 2002 Linux has provided namespace separation for a variety of resources that typically had unified namespaces
-
Mount points: allows different namespaces to see different views of the file system
-
Process IDs: new processes are allocated IDs in their current namespace and all parent namespaces
-
Network: namespaces can have private IP addresses and their own routing tables, and can communicate with other namespaces through virtual interfaces
-
Devices: devices can be present or hidden in different namespaces
cgroups
…a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.
-
Processes and their children remain in the same
cgroup
-
cgroups
may it possible to control the resources allocated to a set of processes
UnionFS
A stackable unification file system.
-
Does
/foo/bar
exist in the top layer? If yes, return its contents. -
Does
/foo/bar
exist in the next layer? If yes, return its contents. -
Etc.
-
Does
/foo/bar
exist in the top layer? If yes, return its contents. -
Access to
/foo
in the next layer is prohibited, so stop. (Even if/foo/bar
exists.
COW File System
Previous container libraries made a copy of the parent’s entire file system. (Containers need a lot of it.)
-
Copy on write!
-
Only make modifications to the underlying file system when the container modifies files.
-
Speeds start up and reduces storage usage.
-
The container mainly needs read-only access to host files.
-
What is Docker?
-
Provides a unified set of tools for container management on a variety of systems
-
Layered file system images for easy updates
-
Now involved in development of containerization libraries on Linux
Example Dockerfile
FROM komljen/ubuntu
MAINTAINER Alen Komljen <[email protected]>
ENV MONGO_VERSION 2.6.6
RUN \
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10 && \
echo "deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist
10gen" \
> /etc/apt/sources.list.d/mongodb.list && \
apt-get update && \
apt-get -y install \
mongodb-org=${MONGO_VERSION} && \
rm -rf /var/lib/apt/lists/*
VOLUME ["/data/db"]
RUN rm /usr/sbin/policy-rc.d
CMD ["/usr/bin/mongod"]
EXPOSE 27017
Next Time
-
Performance benchmarking and analysis!