Preserving contaner properties via volume mounts

In the Kolla project, we were heavily using host bind mounts to share filesystem data with different containers.  A host bind mount is an operation where a host directory, such as /var/lib/mysql is mounted directly into the container at some specific location.

The docker syntax for this operation is:

sudo docker run -d -v /var/lib/mysql:/var/lib/mysql -e MARIADB_ROOT_PASSWORD=password kollaglue/centos-rdo-mariadb-app

This pulls and starts the kollaglue/centos-rdo-mariadb-app container and bind mounts /var/lib/mysql from the host into the container at the same location.  This allows all containers to share the host’s /var/lib/mysql that are started with this bindmount.

Through months of trial and error, we found bind mounting host directories to be highly suboptimal.

Containers exhibit three magic properties.

  • Containers are declarative in nature. A container either starts or fails to start, and should do so consistently. Even though containers typically run imperative code, the imperative nature is abstracted behind a declarative model. So it is possible that an imperative change in the how the container starts could remove this spectacular property. If the service relies on a database, or data stored on the filesystem, the system becomes non-deterministic. Determinism is a major advantage of declarative programming.
  • Containers are immutable. The contents, once created can not be modified except by the container software itself. It is almost like composing an entire distribution including compilers and library runtimes as one binary to be run.
  • Containers should be idempotent. A container should be able to be re-run consistently without failing if it started correctly the first time.

Using a host bind mount weakens or destroys the three magic properties of containers.  Docker, Inc. is intuitively aware this was a problem so they implemented docker data volume containers.  A docker data container is is a container that is started once and creates a docker volume.  A docker volume is permanent persistent storage created by the VOLUME operation in a Dockerfie or the –volume command.  Once the data container is created, it’s docker volume is always available to other docker containers using the volumes-from operation.

The following operation starts a data container based upon the centos image, creates a data volume called /var/lib/myql, and finally runs /bin/true which exits quickly:

docker run -d --name=mariadb_data --volume=/var/lib/mysql centos true

Next the container ID must be retrieved to start the application container:

sudo docker ps -a
CONTAINER ID   IMAGE           COMMAND  CREATED         STATUS                    PORTS  NAMES
56361937ac79    centos:latest  "true"   10 minutes ago  Exited (0) 10 minutes ago        mariadb_data

Next we run the mariadb-app container using the –volumes-from feature. Note docker allows short-hand specification of container id, so in this example 56 is the centos data container 56361937ac79:

sudo docker run -d --volumes-from=56 -e MARIADB_ROOT_PASSWORD=password kollaglue/centos-rdo-mariadb-app

When using data volume containers, all the correct permissions are sorted out by docker automatically. Data is shared between containers. Most importantly it is more difficult to modify the container’s volume contents from outside the container. All of these benefits help preserve the declarative, immutable, and idempotent properties of containers.

We also use data containers for nova-compute in Kolla.  We still continue to use bind mounts in some circumstances.  For example, nova-api needs to run modprobe to load kernel modules.  To support that we allow bind mounting of /var/lib/modules:/var/lib/modules with the :ro (read only) flag.

We also continue to have some container-writeable bind mounts.  The nova-libvirt container requires /sys/fs/cgroups:/sys/fs/cgroups to be bind mounted.  Some types of super privileged containers cannot get away from bind mounts, but most of the Kolla system now runs without them.