I have been working with container technology since September 2014, sorting out how they are useful in the context of OpenStack. This led to my involvement in the Kolla project, a project to containerize OpenStack as well as Magnum, a project to provide containers as a service. Containers are super useful as an upgrade tool for OpenStack, and the main topic of this blog post.
Kolla began life as a project with dependencies on docker and kubernetes. I wasn’t always certain the kubernetes dependency was necessary to provide container deployments in OpenStack, but I went with it. Over time, we found kubernetes has a lot to offer OpenStack deployments. But it lacks a few features which make it unsuitable to deploy “super privileged containers”.
A super privileged container is a container where one or more of the following are true:
- The container’s processes wants to utilize the host network namespace – specifically –net=host flag.
- The container’s processes wants to utilize bind mounting – that is mounting a directory from the host fle-system inside the container and share it.
- The container’s processes wants to utilize the host pid namespace – specifically the –pid=host flag.
Kubernetes could be modified to allow super-privileged containers, but until that day comes, Kubernetes won’t be suitable for running super-privileged containers. There is no way to do these things with existing Kubernetes pod files, however, because they have runtime and privilege considerations – essentially they assume the operator trusts the application running in super-privileged mode with the possibility of rooting their entire datacenter. The kubernetes maintainers have been unwilling to make these options available I suspect because of this concern.
I have spent several weeks researching upgrade of the compute node in nova-networking mode, which consists of a nova-network, nova-compute, and nova-libvirt process. I started by borrowing the Kolla containers for nova-network and nova-compute and cloned them into a new compute-upgrade repo:
[root@bigiron docker]# ls -l nova-compute drwxrwxr-x 2 sdake sdake 4096 Jan 28 13:32 nova-compute drwxrwxr-x 2 sdake sdake 4096 Jan 28 13:27 nova-libvirt drwxrwxr-x 2 sdake sdake 4096 Jan 21 17:59 nova-network
Each directory contains a container for example nova-compute contains:
[root@bigiron docker]# ls -l nova-compute/nova-compute total 12 lrwxrwxrwx 1 sdake sdake 33 Jan 21 08:40 build -> ../../../tools/build-docker-image -rwxrwxr-x 1 sdake sdake 394 Jan 21 08:40 config-nova-compute.sh -rw-rw-r-- 1 sdake sdake 365 Jan 28 13:06 Dockerfile -rwxrwxr-x 1 sdake sdake 83 Jan 28 13:32 start.sh [root@bigiron docker]#
Most of the hard work of this project was building the containers. Half way to victory using the cp command 🙂 Next I sorted out a run command that would run the various containers. I merged the 3 run commands into a script called start-compute.
First, a few directories must be shared for nova-libvirt:
- /sys: To allow libvirt to communicate with systemd in the host process
- /sys/fs/cgroup: To allow libvirt to share cgroup changes with the host process
- /var/lib/libvirt: To allow libvirt and nova to share persistent data
- /var/lib/nova: To allow libvirt and nova to share persistent data
Second, libvirt must be able to reparent processes to the init (pid=1) systemd process during an upgrade. If it can’t do that operation, the libvirt qemu processes will have no parent during an upgrade. Who would be their parent during an upgrade process, where libvirt had been killed? The answer lies in a brand-new docker feature allowing host namespace PID sharing. In order to gain this super-privilege, the –pid=host flag must be used.
Third, nova-network, nova-libvirt, and nova-compute must share the host network namespace. To obtain access to this super-privilege, the docker –pid=host operation must be used.
Finally some non-privileged environment variables must be passed to the container using the -e flag. A combination of these flags results in the following launch command:
sudo docker run -d --privileged -e "KEYSTONE_ADMIN_TOKEN=$PASSWORD" -e "NOVA_DB_PASSWORD=$PASSWORD" -e "RABBIT_PASSWORD=$PASSWORD" -e "RABBIT_USERID=stackrabbit" -e NETWORK_MANAGER="nova" -e "GLANCE_API_SERVICE_HOST=$SERVICE_HOST" -e "KEYSTONE_PUBLIC_SERVICE_HOST=$SERVICE_HOST" -e "RABBITMQ_SERVICE_HOST=$SERVICE_HOST" -e "NOVA_KEYSTONE_PASSWORD=$PASSWORD" -v /sys/fs/cgroup:/sys/fs/cgroup -v /var/lib/nova:/var/lib/nova --pid=host --net=host sdake/fedora-rdo-nova-libvirt
My testbed is a two node Fedora 21 cluster. One node runs devstack in nova-network mode. The remaining node simulates a compute node by running the containers produced in this repository with minimal other operating system services running. Note ebtables must be modprobed on the compute node in the host OS and libvirt must be disabled.
I can start the compute node by running start-compute:
[root@minime tools]# ./start-compute c80b0c9b38efa146200338ad3d781a8ed7a782821abb904493ce14770c6e91c3 1365e60a79715b8ed38b172219666a12a60abae602aba74cf61f99d3be79f2f7 08a20c05607842a27a01e16f3010904785905ccff41173b7e25443a753a5c792 [root@minime tools]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 08a20c056078 sdake/fedora-rdo-nova-compute:latest "/start.sh" 5 seconds ago Up 3 seconds insane_leakey 1365e60a7971 sdake/fedora-rdo-nova-libvirt:latest "/start.sh" 12 seconds ago Up 10 seconds desperate_bell c80b0c9b38ef sdake/fedora-rdo-nova-network:latest "/start.sh" 14 seconds ago Up 12 seconds desperate_mcclintock
No QEMU processes are running:
[root@minime tools]# machinectl MACHINE CONTAINER SERVICE 0 machines listed.
After running nova boot on the controller node:
[sdake@bigiron devstack]$ nova boot steaktwo --flavor m1.medium --image Fedora-x86_64-20-20140618-sda
One machine is found via machinectl. I’ll spare you the output of ps, but it is also present.
root@minime tools]# machinectl MACHINE CONTAINER SERVICE qemu-instance-00000001 vm libvirt-qemu 1 machines listed.
Now stopping the libvirt container:
[root@minime tools]# docker stop 1365e60a7971 [root@minime tools]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 08a20c056078 sdake/fedora-rdo-nova-compute:latest "/start.sh" 7 minutes ago Up 7 minutes insane_leakey c80b0c9b38ef sdake/fedora-rdo-nova-network:latest "/start.sh" 7
Now starting the ibvirt container:
c8368083989e0fa727663447a58d94ffeb6c581479fc501f4bc07e06bf176d22 docker ps[root@minime tools]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c8368083989e sdake/fedora-rdo-nova-libvirt:latest "/start.sh" 7 seconds ago Up 5 seconds compassionate_fermat 08a20c056078 sdake/fedora-rdo-nova-compute:latest "/start.sh" 9 minutes ago Up 9 minutes insane_leakey c80b0c9b38ef sdake/fedora-rdo-nova-network:latest "/start.sh" 9 minutes ago Up 9 minutes desperate_mcclintock
Now the compute VM can be terminated via nova after an upgrade:
[sdake@bigiron devstack]$ nova stop steaktwo
And the VM process disappears:
[root@minime tools]# machinectl MACHINE CONTAINER SERVICE 0 machines listed.
Ok, so you just showed stopping and starting a container? where is the atomic part? Any container of OpenStack compute can be atomically upgraded as follows:
- docker pull (to obtain new image)
- docker stop
- docker start
From the compute infrastructure, it looks like an atomic upgrade. No messy upgrades of a hundreds of RPM or DEB packages. Just replace a running image with a new image.
It is highly likely I will re-integrate this work into Kolla, since Kolla is the home for R&D related to launching OpenStack within containers. Unfortunately until kubernetes grows the required features, it is unsuitable for a deployment system for OpenStack compute nodes.