The TL;DR on Immutable Infrastructure

After having worked on developing Kolla, an Immutable deployment tool using Ansible to deploy Docker containers containing OpenStack since September 2014, I have come to a clear conclusion there is some confusion about what precisely immutability is and how to best achieve it.

What is this immutable infrastructure thing I keep hearing about?  Well first two definitions from google define:

  1. unchanging over time or unable to be change


  1. the basic physical and organizational structures and facilities (e.g., buildings, roads, and power supplies) needed for the operation of a society or enterprise.

First I’ll dissect immutable.  A running container consists at a high level of two things.  It consists of a complete filesystem and running application first and foremost.  More importantly, and this is where everyone gets hung up around immutability, it includes the application’s configuration options.  Some hard-core computer science nerds may think that thee only way to achieve immutability is to pass the configuration options through the environment so from container instantiation until container destruction, the configuration options always remain consistent.  This is not the only solution.

Why is immutability important?  The reason immutability is desireable is to turn any stateless imperative system into a declarative system.  In an imperative system, there are many steps required to achieve a successful (or failed) outcome.  By wrapping an imperative system in a container where the configuration never changes, that imperative (read: more complex) system has been turned into declarative system.  A declarative system has two outcomes (success or fail) and is always deterministic, meaning it always will have the same outcome after instantiation.  I use always with a grain of salt.  A cosmic ray could blow up system ram, the hard disk could fail in some way, a kernel bug could trigger an oops on the call path, or an ateroid could hit the Phoenix datacenters!  Lets just assume for a moment we throw out these failure scenarios and look to the positive side of things, which is, our infrastructure on which our immutability engine runs will never fail!

Now I’ll dissect infrastructure.  Infrastructure is all of the software that goes into making up a system.  In the case of Kolla and OpenStack, very little of OpenStack is actually stateless, but immutability still comes to the rescue in many cases.  For one, no administrator can muck around with the config options of a running system and crater the environment and have no idea what went wrong.  In a properly designed infrastructure, the administrator will configure all options in one place and that configuration will be distributed through the system causing all of the config-option related infrastructure to fail, or all of it to succeed.  The container infrastructure of Kolla includes 89 containers, many of which require some form of state, and most of  which depend on a database which can result in non-declarative behavior.

Immutability sounds pretty hot huh?  The problem is configuring software via environment variables is a huge pain in the ass.  Just looking at the reference implementation of docker registry v2.0, significant complexity goes into reading the environment variables without actually altering the contents of the virtual disk.  This is really the gold standard for an immutable infrastructure component, but is not the only way to solve the problem.

Remember, we are after a pragmatic declarative system (they why of immutability) not some gold standard where absolutely nothing in the filesystem changes.  While a completely unchanged filesystem contents meets the definition of immutability, the spirit of immutability can be met in different ways.

During kola development we have tried pretty much every method to solve this problem.  I will enumerate the solutions:

  1. Encode the configuration into the build of the container: This method delivers the immutability similar to docker registry, meaning that nothing on the disk changes, ever.  The problem with this approach is any configuration change requires a pokey container rebuild and causes deployment (the config options come from the deployment system) and image building to be mixed, violating separation of concerns.
  2. Encode some environment variables with important information and use crudini to set the on-disk configuration in /etc/service.  This delivers near-immutability but trades off complete customization.  I say near-immutability because the crudini operation would have to be deterministic for immutability to be preserved, which is hard to guarantee.  Encoding the thousands of config options that make up the big tent is hard to manage and if we did that we would want oslo.config to read the config from the environment, not the filesystem.  The result is only *some* options end up being added to the environment, the critical ones, limiting configurability.
  3. Create the configuration file that the OpenStack service runs against outside the container.  Originally I highly disliked this idea, but I think it was kfox1111 who came to the rescue and suggested “what if you just configure the container one time?”  It took me a few days to process that, but what that means is after the container starts, it runs code which host bind-mounts the configuration file, and then configures the container one time.  After the container is configured, no further alterations of the configuration are permitted without a redeploy from a central location, meaning arbitrary administrator tinkering won’t damage the deployment.  Does this deliver immutability?  Absolutely.  From container instantiation (which finishes with the configuration options are locked into place) to container destruction, the contents of the disk never change.  Immutability preserved, which zero tradeoff – no pokey build on deploy (which can take several hours with v2 registry), still maintain separation of concerns and most importantly Operators maintain complete customization over their environment.
  4. Encode the configuration file generated by the deployment tool into a JSON blob which sets the environment or configuration files appropriately.  Then use crudini to set the config options on each boot.  This would work but its not very declarative because of the crudini interaction – it was our first attempt, but we found other options to be more viable.

With Kolla we started with #2, briefly tried #4, and finished with technique #3.  I’m interested to hear in the comments of this blog post how other people achieved immutability in their infrastructure components without using the onerous environment variable to pass in hundreds of configuration options.

It would be interesting if docker added some type of immutable file loading that was built in and only read the external file(s) (and installed it) the first time the container was run.  Alas there is no such thing.

The next step in immutable infrastructure is ensuring a security breakout of the process has limited ability to modify the filesystem.  For example, if external software were running as root and could somehow modify files at a whim, it could modify /etc/sudoers and easily escalate privileges to root inside the container.  Then there would be a real problem!  This same problem can happen on bare metal, but containers insert an extra layer to break through so they actually increase security compared to bare metal.  We solved this problem in Kolla by running the containers as regular users and limiting their scope to modify system files which are only owned by the processes UID/GID.  While it would be possible for some minimal damage to be done as a non-root user, at-least the immutability of the files not controlled by the process would be preserved.

I’ve wrote about what immutability is, and a little about why you would want immutability.  Besides the warping of a process from imperative to declarative, there are benefits that trickle from that reality.

  1. An operator has to try hard (using docker exec) to modify the contents of the filesystem.  Immutability protects vendors from cowboy coders like myself – since someone that goes around and mucks with the internals of a container is unlikely to call technical support.
  2. A  technical support agent doesn’t have to guess what software has been installed on the system which may cause conflicts with that vendor’s software.  An immutable deployment target should only have immutable software on it.
  3. Upgrades and downgrades work flawlessly since the full state of the system including its configuration is recorded in the containers.
  4. Immutable infrastructure will change the world.  We are just in the beginning phase of the conversion to immutability.  Companies are kicking the tires on their favorite immutability engine (mine is Docker).  The immutability software is a bit green.  Still, I feel completely comfortable deploying OpenStack in n-way active H/A mode using Kolla using docker as our immutability engine.  Kolla doesn’t use any complex features of Docker; everything we use has been in use for a year or more in the field.

I hope folks find this blog post helpful in your journey towards implementing immutable datacenter.  That is the next big thing in computing, and will take years to achieve

Announcing the Release of Kolla Liberty

Hello OpenStackers!

The Kolla community is pleased to announce the release of the Kolla Liberty.  This release fixes 432 bugs and implements 58 blueprints!

During Liberty, Kolla joined the big tent governance!  Our project can be found in the OpenStack Governance repository.

Kolla is an opinionated OpenStack deployment system unless the operator has opinions!  Kolla is completely customizable but comes with consumable out of the box defaults for use with Ansible deployment.  To understand the Kolla community’s philosophy towards deployment, please read our customize deployment documentation.

Kolla includes the following features:

  • AIO and multinode deployment using Ansible with n-way active high availability.
  • Vastly improved documentation.
  • Tools to build Docker images and deploy OpenStack via Ansible in Docker containers.
  • Build containers for CentOS, Oracle Linux, RHEL, and Ubuntu distributions.
  • Build containers from both binary packaging and directly from source.
  • Development environments using Heat, Vagrant, or bare-metal.
  • All “core” OpenStack services implemented as micro-services in Docker containers.
  • Minimal host deployment target dependencies requiring only docker-engine and docker-py.

The following services can be deployed via Ansible in 12 to 15 minutes with 3 node high availability:

  • ceph for glance, nova, cinder  
  • cinder (only ceph is implemented as a backend at this time)
  • glance
  • haproxy
  • heat
  • horizon
  • ironic (tech preview)
  • keystone
  • mariadb with galera replication
  • memcached
  • murano
  • neutron
  • nova
  • rabbitmq
  • swift

Kolla’s implementation is stable and the core reviewers feel Kolla is ready for evaluation by operators and third party projects.  We strongly encourage people to evaluate the included Ansible deployment tooling and are keen for additional feedback.

Preserving contaner properties via volume mounts

In the Kolla project, we were heavily using host bind mounts to share filesystem data with different containers.  A host bind mount is an operation where a host directory, such as /var/lib/mysql is mounted directly into the container at some specific location.

The docker syntax for this operation is:

sudo docker run -d -v /var/lib/mysql:/var/lib/mysql -e MARIADB_ROOT_PASSWORD=password kollaglue/centos-rdo-mariadb-app

This pulls and starts the kollaglue/centos-rdo-mariadb-app container and bind mounts /var/lib/mysql from the host into the container at the same location.  This allows all containers to share the host’s /var/lib/mysql that are started with this bindmount.

Through months of trial and error, we found bind mounting host directories to be highly suboptimal.

Containers exhibit three magic properties.

  • Containers are declarative in nature. A container either starts or fails to start, and should do so consistently. Even though containers typically run imperative code, the imperative nature is abstracted behind a declarative model. So it is possible that an imperative change in the how the container starts could remove this spectacular property. If the service relies on a database, or data stored on the filesystem, the system becomes non-deterministic. Determinism is a major advantage of declarative programming.
  • Containers are immutable. The contents, once created can not be modified except by the container software itself. It is almost like composing an entire distribution including compilers and library runtimes as one binary to be run.
  • Containers should be idempotent. A container should be able to be re-run consistently without failing if it started correctly the first time.

Using a host bind mount weakens or destroys the three magic properties of containers.  Docker, Inc. is intuitively aware this was a problem so they implemented docker data volume containers.  A docker data container is is a container that is started once and creates a docker volume.  A docker volume is permanent persistent storage created by the VOLUME operation in a Dockerfie or the –volume command.  Once the data container is created, it’s docker volume is always available to other docker containers using the volumes-from operation.

The following operation starts a data container based upon the centos image, creates a data volume called /var/lib/myql, and finally runs /bin/true which exits quickly:

docker run -d --name=mariadb_data --volume=/var/lib/mysql centos true

Next the container ID must be retrieved to start the application container:

sudo docker ps -a
CONTAINER ID   IMAGE           COMMAND  CREATED         STATUS                    PORTS  NAMES
56361937ac79    centos:latest  "true"   10 minutes ago  Exited (0) 10 minutes ago        mariadb_data

Next we run the mariadb-app container using the –volumes-from feature. Note docker allows short-hand specification of container id, so in this example 56 is the centos data container 56361937ac79:

sudo docker run -d --volumes-from=56 -e MARIADB_ROOT_PASSWORD=password kollaglue/centos-rdo-mariadb-app

When using data volume containers, all the correct permissions are sorted out by docker automatically. Data is shared between containers. Most importantly it is more difficult to modify the container’s volume contents from outside the container. All of these benefits help preserve the declarative, immutable, and idempotent properties of containers.

We also use data containers for nova-compute in Kolla.  We still continue to use bind mounts in some circumstances.  For example, nova-api needs to run modprobe to load kernel modules.  To support that we allow bind mounting of /var/lib/modules:/var/lib/modules with the :ro (read only) flag.

We also continue to have some container-writeable bind mounts.  The nova-libvirt container requires /sys/fs/cgroups:/sys/fs/cgroups to be bind mounted.  Some types of super privileged containers cannot get away from bind mounts, but most of the Kolla system now runs without them.

An atomic upgrade process for OpenStack compute nodes

I have been working with container technology since September 2014, sorting out how they are useful in the context of OpenStack.  This led to my involvement in the Kolla project, a project to containerize OpenStack as well as Magnum, a project to provide containers as a service.  Containers are super useful as  an upgrade tool for OpenStack, and the main topic of this blog post.

Kolla began life as a project with dependencies on docker and kubernetes.  I wasn’t always certain the kubernetes dependency was necessary to provide container deployments in OpenStack, but I went with it.  Over time, we found kubernetes has a lot to offer OpenStack deployments.  But it lacks a few features which make it unsuitable to deploy “super privileged containers”.

A super privileged container is a container where one or more of the following are true:

  • The container’s processes wants to utilize the host network namespace – specifically –net=host flag.
  • The container’s processes wants to utilize bind mounting – that is mounting a directory from the host fle-system inside the container and share it.
  • The container’s processes wants to utilize the host pid namespace – specifically the –pid=host flag.

Kubernetes could be modified to allow super-privileged containers, but until that day comes, Kubernetes won’t be suitable for  running super-privileged containers.  There is no way to do these things with existing Kubernetes pod files, however, because they have runtime and privilege considerations – essentially they assume the operator trusts the application running in super-privileged mode with the possibility of rooting their entire datacenter.  The kubernetes maintainers have been unwilling to make these options available I suspect because of this concern.

I have spent several weeks researching upgrade of the compute node in nova-networking mode, which consists of a nova-network, nova-compute, and nova-libvirt process.  I started by borrowing the Kolla containers for nova-network and nova-compute and cloned them into a new compute-upgrade repo:

[root@bigiron docker]# ls -l nova-compute
drwxrwxr-x 2 sdake sdake 4096 Jan 28 13:32 nova-compute
drwxrwxr-x 2 sdake sdake 4096 Jan 28 13:27 nova-libvirt
drwxrwxr-x 2 sdake sdake 4096 Jan 21 17:59 nova-network

Each directory contains a container for example nova-compute contains:

[root@bigiron docker]# ls -l nova-compute/nova-compute
total 12
lrwxrwxrwx 1 sdake sdake  33 Jan 21 08:40 build -> ../../../tools/build-docker-image
-rwxrwxr-x 1 sdake sdake 394 Jan 21 08:40
-rw-rw-r-- 1 sdake sdake 365 Jan 28 13:06 Dockerfile
-rwxrwxr-x 1 sdake sdake  83 Jan 28 13:32
[root@bigiron docker]# 

Most of the hard work of this project was building the containers. Half way to victory using the cp command 🙂 Next I sorted out a run command that would run the various containers. I merged the 3 run commands into a script called start-compute.

First, a few directories must be shared for nova-libvirt:

  • /sys: To allow libvirt to communicate with systemd in the host process
  • /sys/fs/cgroup: To allow libvirt to share cgroup changes with the host process
  • /var/lib/libvirt: To allow libvirt and nova to share persistent data
  • /var/lib/nova: To allow libvirt and nova to share persistent data

Second, libvirt must be able to reparent processes to the init (pid=1) systemd process during an upgrade.  If it can’t do that operation, the libvirt qemu processes will have no parent during an upgrade.  Who would be their parent during an upgrade process, where libvirt had been killed? The answer lies in a brand-new docker feature allowing host namespace PID sharing.  In order to gain this super-privilege, the –pid=host flag must be used.

Third, nova-network, nova-libvirt, and nova-compute must share the host network namespace.  To obtain access to this super-privilege, the docker –pid=host operation must be used.

Finally some non-privileged environment variables must be passed to the container using the -e flag. A combination of these flags results in the following launch command:

sudo docker run -d --privileged -e "KEYSTONE_ADMIN_TOKEN=$PASSWORD" -e "NOVA_DB_PASSWORD=$PASSWORD" -e "RABBIT_PASSWORD=$PASSWORD" -e "RABBIT_USERID=stackrabbit" -e NETWORK_MANAGER="nova" -e "GLANCE_API_SERVICE_HOST=$SERVICE_HOST" -e "KEYSTONE_PUBLIC_SERVICE_HOST=$SERVICE_HOST" -e "RABBITMQ_SERVICE_HOST=$SERVICE_HOST" -e "NOVA_KEYSTONE_PASSWORD=$PASSWORD" -v /sys/fs/cgroup:/sys/fs/cgroup -v /var/lib/nova:/var/lib/nova --pid=host --net=host sdake/fedora-rdo-nova-libvirt

My testbed is a two node Fedora 21 cluster. One node runs devstack in nova-network mode. The remaining node simulates a compute node by running the containers produced in this repository with minimal other operating system services running. Note ebtables must be modprobed on the compute node in the host OS and libvirt must be disabled.

I can start the compute node by running start-compute:

[root@minime tools]# ./start-compute
[root@minime tools]# docker ps
CONTAINER ID        IMAGE                                  COMMAND             CREATED             STATUS              PORTS               NAMES
08a20c056078        sdake/fedora-rdo-nova-compute:latest   "/"         5 seconds ago       Up 3 seconds                            insane_leakey          
1365e60a7971        sdake/fedora-rdo-nova-libvirt:latest   "/"         12 seconds ago      Up 10 seconds                           desperate_bell         
c80b0c9b38ef        sdake/fedora-rdo-nova-network:latest   "/"         14 seconds ago      Up 12 seconds                           desperate_mcclintock   

No QEMU processes are running:

[root@minime tools]# machinectl
MACHINE                          CONTAINER SERVICE         

0 machines listed.

After running nova boot on the controller node:

[sdake@bigiron devstack]$ nova boot steaktwo --flavor m1.medium --image Fedora-x86_64-20-20140618-sda

One machine is found via machinectl. I’ll spare you the output of ps, but it is also present.

root@minime tools]# machinectl
MACHINE                          CONTAINER SERVICE         
qemu-instance-00000001           vm        libvirt-qemu    

1 machines listed.

Now stopping the libvirt container:

[root@minime tools]# docker stop 1365e60a7971
[root@minime tools]# docker ps
CONTAINER ID        IMAGE                                  COMMAND             CREATED             STATUS              PORTS               NAMES
08a20c056078        sdake/fedora-rdo-nova-compute:latest   "/"         7 minutes ago       Up 7 minutes                            insane_leakey          
c80b0c9b38ef        sdake/fedora-rdo-nova-network:latest   "/"         7

Now starting the ibvirt container:

docker ps[root@minime tools]# docker ps
CONTAINER ID        IMAGE                                  COMMAND             CREATED             STATUS              PORTS               NAMES
c8368083989e        sdake/fedora-rdo-nova-libvirt:latest   "/"         7 seconds ago       Up 5 seconds                            compassionate_fermat   
08a20c056078        sdake/fedora-rdo-nova-compute:latest   "/"         9 minutes ago       Up 9 minutes                            insane_leakey          
c80b0c9b38ef        sdake/fedora-rdo-nova-network:latest   "/"         9 minutes ago       Up 9 minutes                            desperate_mcclintock

Now the compute VM can be terminated via nova after an upgrade:

[sdake@bigiron devstack]$ nova stop steaktwo

And the VM process disappears:

[root@minime tools]# machinectl
MACHINE                          CONTAINER SERVICE         

0 machines listed.

Ok, so you just showed stopping and starting a container? where is the atomic part? Any container of OpenStack compute can be atomically upgraded as follows:

  • docker pull (to obtain new image)
  • docker stop
  • docker start

From the compute infrastructure, it looks like an atomic upgrade. No messy upgrades of a hundreds of RPM or DEB packages. Just replace a running image with a new image.

It is highly likely I will re-integrate this work into Kolla, since Kolla is the home for R&D related to launching OpenStack within containers. Unfortunately until kubernetes grows the required features, it is unsuitable for a deployment system for OpenStack compute nodes.

Isn’t it Atomic on OpenStack Ironic, don’t you think?

OpenStack Ironic is a bare metal as a service deployment tool.  Fedora Atomic is a µOS consisting of a very minimal installation of Linux,, Kubernetes and Docker.  Kubernetes is an endpoint manager and container scheduler, while Docker is a container manager.  The basic premise of Fedora Atomic using Ironic is to present a lightweight launching mechanism for OpenStack.

The first step in launching Atomic is to make Ironic operational.  I used devstack for my deployment.  The Ironic developer documentation is actually quite good for a recently Integrated OpenStack project.  I followed the instructions for devstack.  I used pxe+ssh, rather then the agent+ssh.  The pxe+ssh driver virtualizes bare-metal deployment for testing purposes, so only one machine is needed.  The machine should have 16GB+ of RAM.  I find 16GB a bit tight, however.

I found it necessary to hack devstack a bit to get Ironic to operate.  The root cause of the issue is that libvirt can’t write the console log to the home directory as specified in the localrc. To solve the problem I just hacked devstack to write the log files to /tmp. I am sure there is a more elegant way to solve this problem.

The diff of my devstack hack is:

[sdake@bigiron devstack]$ git diff
diff --git a/tools/ironic/scripts/create-node b/tools/ironic/scripts/create-node
index 25b53d4..5ba88ce 100755
--- a/tools/ironic/scripts/create-node
+++ b/tools/ironic/scripts/create-node
@@ -54,7 +54,7 @@ if [ -f /etc/debian_version ]; then
if [ -n "$LOGDIR" ] ; then
- VM_LOGGING="--console-log $LOGDIR/${NAME}_console.log"
+ VM_LOGGING="--console-log /tmp/${NAME}_console.log"

My devstack localrc contains:


disable_service horizon
disable_service rabbit
disable_service quantum
enable_service qpid
enable_service magnum

# Enable Ironic API and Ironic Conductor
enable_service ironic
enable_service ir-api
enable_service ir-cond

# Enable Neutron which is required by Ironic and disable nova-network.
disable_service n-net
enable_service q-svc
enable_service q-agt
enable_service q-dhcp
enable_service q-l3
enable_service q-meta
enable_service neutron

# Create 3 virtual machines to pose as Ironic's baremetal nodes.

# The parameters below represent the minimum possible values to create
# functional nodes.

# Size of the ephemeral partition in GB. Use 0 for no ephemeral partition.

# By default, DevStack creates a network for instances.
# If this overlaps with the hosts network, you may adjust with the
# following.

# Log all output to files

It took me two days to sort out the project in this blog post, and during the process, I learned a whole lot about how Ironic operates by code inspection and debugging.  I couldn’t find much documentation about the deployment process so I thought I’d share a nugget of information about the deployment process:

  • Nova contacts Ironic to allocate an Ironic node providing the image to boot
  • Ironic pulls the image from glance and stores it on the local hard disk
  • Ironic boots a virtual machine via SSH with a PXE-enabled seabios BIOS
  • The seabios code asks Ironic’s tftpserver for a deploy ramdisk and kernel
  • The deployed node starts the deploy kernel and ramdisk
  • The deploy ramdisk does the following:
    1. Starts tgtd to present the root device as an iSCSI disk on the network
    2. Contacts the Ironic ReST API to initiate iSCSI transfer of the image
    3. Waits on port 10000 for a network connection to indicate the iSCSI transfer is complete
    4. Reboots the node once port 10000 has been opened and closed by a process
  • Once the deploy ramdisk contacts Ironic to initiate iSCSI transfer of the image Ironic does the following:
    1. uses iscsiadm to connect to the ISCSI target on the deploy hardware
    2. spawns several dd processes to copy the local disk image to the iSCSI target
    3. Once the dd processes exit successfully, Ironic contacts port 10000 on the deploy node
  • Ironic changes the PXEboot configuration to point to the user’s actual desired ramdisk and kernel
  • The deploy node reboots into SEABIOS again
  • The node boots the proper ramdisk and kernel, which load the disk image that was written via iSCSI

Fedora Atomic does not ship images that are suitable for use with the Ironic model.  Specifically what is needed is a LiveOS image, a ramdisk, and a kernel.  The LiveOS image that Fedora Cloud does ship is not the Atomic version.  Clearly it is early days for Atomic and I expect these requirements will be met as time passes.

But I wanted to deploy Atomic now on Ironic, so I sorted out making a PXE-bootable Atomic Live OS image.

First a bit about how the Atomic Cloud Image is structured:

[sdake@bigiron Downloads]$ guestfish

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
'man' to read the manual
'quit' to quit the shell

><fs> add-ro Fedora-Cloud-Atomic-20141203-21.x86_64.qcow2
><fs> run
><fs> list-filesystems
/dev/sda1: ext4
/dev/atomicos/root: xfs

The Atomic cloud image has /dev/sda1 containing the contents of the /boot directory.  The /dev/sda2 partition contains a LVM partition.  There is a logical volume called atomicos/root which contains the root filesystem.

Building the Fedora Atomic images for Ironic is as simple as extracting the ramdisk and kernel from /dev/sda1 and extracting /dev/sda2 into an image for Ironic to dd to the iSCSI target.  A bit complicating is that the fstab must have the /boot entry removed.  Determining how to do this was a bit of a challenge, but I wrote a script to automate the Ironic image generation process.

The first step is to test that Ironic actually installs via devstack using the above localrc:

[sdake@bigiron devstack]$ ./
bunch of output from devstack ommitted
Keystone is serving at
Examples on using novaclient command line is in
The default users are: admin and demo
The password: 123456
This is your host ip:

Next, take a look at the default image list which should look something like:

[sdake@bigiron devstack]$ source ./openrc admin admin
[sdake@bigiron devstack]$ glance image-list
| Name                            | Disk Format | Container Format | Size      |
| cirros-0.3.2-x86_64-disk        | qcow2       | bare             | 13167616  |
| cirros-0.3.2-x86_64-uec         | ami         | ami              | 25165824  |
| cirros-0.3.2-x86_64-uec-kernel  | aki         | aki              | 4969360   |
| cirros-0.3.2-x86_64-uec-ramdisk | ari         | ari              | 3723817   |
| Fedora-x86_64-20-20140618-sda   | qcow2       | bare             | 209649664 |
| ir-deploy-pxe_ssh.initramfs     | ari         | ari              | 95220206  |
| ir-deploy-pxe_ssh.kernel        | aki         | aki              | 5808960   |

In this case, we want to boot the UEC image. Ironic expects properties attached to the image ramdisk_id and kernel_id which are the UUIDs of cirros-0.3.2-x86_64-uec-kernel and cirros-0.3.2-x86_64-uec-ramdisk.

Running image-show, we can see these properties:

[sdake@bigiron devstack]$ glance image-show cirros-0.3.2-x86_64-uec 
| Property              | Value                                |
| Property 'kernel_id'  | c11bd198-227f-4156-9195-40b16278b65c |
| Property 'ramdisk_id' | 5e6839ef-daeb-4a1c-be36-3906ed4d7bd7 |
| checksum              | 4eada48c2843d2a262c814ddc92ecf2c     |
| container_format      | ami                                  |
| created_at            | 2014-12-09T14:56:05                  |
| deleted               | False                                |
| disk_format           | ami                                  |
| id                    | 259ca231-66ad-439d-900b-3dc9e9408a0c |
| is_public             | True                                 |
| min_disk              | 0                                    |
| min_ram               | 0                                    |
| name                  | cirros-0.3.2-x86_64-uec              |
| owner                 | 4b798efdcd5142509fe87b12d89d5949     |
| protected             | False                                |
| size                  | 25165824                             |
| status                | active                               |
| updated_at            | 2014-12-09T14:56:06                  |

Now that we have validated the cirros image is available, the next step is to launch one from the demo user:

[sdake@bigiron devstack]$ source ./openrc demo demo
[sdake@bigiron devstack]$ nova keypair-add --pub-key ~/.ssh/ steak
[sdake@bigiron devstack]$ nova boot --flavor baremetal --image cirros-0.3.2-x86_64-uec --key-name steak cirros_on_ironic
[sdake@bigiron devstack]$ nova list
| ID                                   | Name             | Status | Task State | Power State | Networks         |
| 9e64804d-264d-40d2-88f4-e858efe69557 | cirros_on_ironic | ACTIVE | -          | Running     | private= |
[sdake@bigiron devstack]$ ssh cirros@
$ uname -a
Linux cirros-on-ironic 3.2.0-60-virtual #91-Ubuntu SMP Wed Feb 19 04:13:28 UTC 2014 x86_64 GNU/Linux

If this part works, that means you have a working Ironic devstack setup. The next step is to get the Atomic images and convert them for use with Ironic.

[sdake@bigiron fedora-atomic-to-liveos-pxe]$ ./
Mounting boot and root filesystems.
Done mounting boot and root filesystems.
Removing boot from /etc/fstab.
Done removing boot from /etc/fstab.
Extracting kernel to fedora-atomic-kernel
Extracting ramdisk to fedora-atomic-ramdisk
Unmounting boot and root.
Creating a RAW image from QCOW2 image.
Extracting base image to fedora-atomic-base.
cut: invalid byte, character or field list
Try 'cut --help' for more information.
sfdisk: Disk fedora-atomic.raw: cannot get geometry
sfdisk: Disk fedora-atomic.raw: cannot get geometry
12171264+0 records in
12171264+0 records out
6231687168 bytes (6.2 GB) copied, 29.3357 s, 212 MB/s
Removing raw file.

The sfdisk: cannot get geometry warnings can be ignored.

After completion you should have fedora-atomic-kernel, fedora-atomic-ramdisk, and fedora-atomic-base files. Next we register these with glance:

[sdake@bigiron fedora-atomic-to-liveos-pxe]$ ls -l fedora-*
-rw-rw-r-- 1 sdake sdake 6231687168 Dec  9 08:59 fedora-atomic-base
-rwxr-xr-x 1 root  root     5751144 Dec  9 08:59 fedora-atomic-kernel
-rw-r--r-- 1 root  root    27320079 Dec  9 08:59 fedora-atomic-ramdisk
[sdake@bigiron fedora-atomic-to-liveos-pxe]$ glance image-create --name=fedora-atomic-kernel --container-format aki --disk-format aki --is-public True --file fedora-atomic-kernel
| Property         | Value                                |
| checksum         | 220c2e9d97c3f775effd2190199aa457     |
| container_format | aki                                  |
| created_at       | 2014-12-09T16:47:12                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | aki                                  |
| id               | b8e08b02-5eac-467d-80e1-6c8138d0bf57 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | fedora-atomic-kernel                 |
| owner            | a28b73a4f29044f184b854ffb7532ceb     |
| protected        | False                                |
| size             | 5751144                              |
| status           | active                               |
| updated_at       | 2014-12-09T16:47:12                  |
| virtual_size     | None                                 |
[sdake@bigiron fedora-atomic-to-liveos-pxe]$ glance image-create --name=fedora-atomic-ramdisk --container-format ari --is-public True --disk-format ari --file fedora-atomic-ramdisk
| Property         | Value                                |
| checksum         | 9ed72ddc0411e2f30d5bbe6b5c2c4047     |
| container_format | ari                                  |
| created_at       | 2014-12-09T16:48:31                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | ari                                  |
| id               | a62f6f32-ed66-4b18-8625-52d7262523f6 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | fedora-atomic-ramdisk                |
| owner            | a28b73a4f29044f184b854ffb7532ceb     |
| protected        | False                                |
| size             | 27320079                             |
| status           | active                               |
| updated_at       | 2014-12-09T16:48:31                  |
| virtual_size     | None                                 |
[sdake@bigiron fedora-atomic-to-liveos-pxe]$ glance image-create --name=fedora-atomic --container-format ami --disk-format ami --is-public True --property ramdisk_id=b2f60f33-9c8e-4905-a64b-90997d3dcb92 --property kernel_id=0e687b76-31d0-4351-a92a-a2d348482d42 --file fedora-atomic-base
| Property              | Value                                |
| Property 'kernel_id'  | 0e687b76-31d0-4351-a92a-a2d348482d42 |
| Property 'ramdisk_id' | b2f60f33-9c8e-4905-a64b-90997d3dcb92 |
| checksum              | 6a25f8bf17a94a6682d73b7de0a13013     |
| container_format      | ami                                  |
| created_at            | 2014-12-09T16:52:45                  |
| deleted               | False                                |
| deleted_at            | None                                 |
| disk_format           | ami                                  |
| id                    | d4ec78d7-445a-473d-9b7d-a1a6408aeed2 |
| is_public             | True                                 |
| min_disk              | 0                                    |
| min_ram               | 0                                    |
| name                  | fedora-atomic                        |
| owner                 | a28b73a4f29044f184b854ffb7532ceb     |
| protected             | False                                |
| size                  | 6231687168                           |
| status                | active                               |
| updated_at            | 2014-12-09T16:53:16                  |
| virtual_size          | None                                 |

Next we configure Ironic’s PXE boot config options and restart the ironic conductor in devstack. To restart Ironic conductor use screen -r, find the appropriate conductor screen, press CTRL-C, up arrow, ENTER. This will reload the configuration.

/etc/ironic/ironic.conf should be changed to have this config option:

pxe_append_params = nofb nomodeset vga=normal console=ttyS0 no_timer_check root=/dev/mapper/atomicos-root ostree=/ostree/boot.0/fedora-atomic/a002a2c2e44240db614e09e82c7822322253bfcaad0226f3ff9befb9f96d315f/0

Next we launch the fedora-atomic image using Nova’s baremetal flavor:

[sdake@bigiron ~]$ source /home/sdake/repos/devstack/openrc demo demo
[sdake@bigiron Downloads]$ nova boot --flavor baremetal --image fedora-atomic --key-name steak fedora_atomic_on_ironic
[sdake@bigiron Downloads]$ nova list
| ID                                   | Name                    | Status | Task State | Power State | Networks |
| e7f56931-307d-45a7-a232-c2fa70898cae | fedora-atomic_on_ironic | BUILD  | spawning   | NOSTATE     |          |

Finally login to the Atomic Host:

[sdake@bigiron ironic]$ nova list
| ID                                   | Name                    | Status | Task State | Power State | Networks         |
| d061c0ef-f8b7-4fff-845b-8272a7654f70 | fedora-atomic_on_ironic | ACTIVE | -          | Running     | private= |
[sdake@bigiron ironic]$ ssh fedora@
[fedora@fedora-atomic-on-ironic ~]$ uname -a
Linux fedora-atomic-on-ironic.novalocal 3.17.4-301.fc21.x86_64 #1 
SMP Thu Nov 27 19:09:10 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I found determining how to create the images from the Fedora Atomic Cloud images a bit tedious. The diskimage builder tool would likely make this easier, if it supported RPM-ostree and Atomic.

Ironic needs some work to allow the pxe options to override the “root” initrd parameter. Ideally a glance image property would be allowed to be specified to override and extend the boot options. I’ve filed an Ironic blueprint for such an improvement.

Turbocharge DevStack with Raid 0 SSDs

Turbocharging DevStack

I wanted to turbocharge my development cycle of OpenStack running on Fedora 18 so I could be waiting on my brain rather then waiting on my workstation.  I decided to purchase two modern solid state drives (SSD) and run them in RAID 0.  I chose two Intel S3500 160 GB Enterprise grade SSDs to run in RAID 0.  My second choice was the Samsung 840 Pro which may have been a bit faster, but perhaps not as reliable.

Since OpenStack and DevStack mostly use /var and /opt for their work, I decided to replace only /var and /opt.  If a SSD fails, I am less likely to lose my home directory which may contain some work in progress because of the lower availability of RAID 0.

The Baseline HP Z820

For a baseline my system is a Hewlett Packard Z820 workstation (model #B2C08UT#ABA) that I purchased from Provantage in January 2013.  Most of the computer is a beast sporting an 8 core Intel Xeon 35-2670 @ 2.60GHZ running with Hyperthreading for 16 total cpus, Intel C602 chipset,  and 16 GB Quad Channel DDR3 ECC Unbuffered RAM.

The memory is fast as shown with ramspeed:

[sdake@bigiron ramspeed-2.6.0]$ ./ramspeed -b 3 -m 4096
RAMspeed (Linux) v2.6.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09

8Gb per pass mode

INTEGER   Copy:      11549.61 MB/s
INTEGER   Scale:     11550.59 MB/s
INTEGER   Add:       11885.79 MB/s
INTEGER   Triad:     11834.27 MB/s
INTEGER   AVERAGE:   11705.06 MB/s

Unfortunately the disk is a pokey 1TB 7200 RPM model.  The hdparm tool shows a pokey 118MB/sec.

[sdake@bigiron ~]$ sudo hdparm -tT /dev/sda
Timing cached reads: 20590 MB in 2.00 seconds = 10308.76 MB/sec
Timing buffered disk reads: 358 MB in 3.02 seconds = 118.69 MB/sec

Using the Gnome 3 Disk Image Benchmarking tool show a lower average of 82MB per second, although this is also passing through the LVM driver:


Warning: I didn’t run this benchmark with write enabled, as it would have destroyed the data on my disk.

Running takes 6 minutes:

[sdake@bigiron devstack]$ ./
Using mysql database backend
Installing package prerequisites...[|[/]^C[sdake@bigiron devstack]$ 
[sdake@bigiron devstack]$ ./
Using mysql database backend
Installing package prerequisites...done
Installing OpenStack project source...done
Starting qpid...done
Configuring and starting MySQL...done
Starting Keystone...done
Configuring Glance...done
Configuring Nova...done
Configuring Cinder...done
Configuring Nova...done
Using libvirt virtualization driver...done
Starting Glance...done
Starting Nova API...done
Starting Nova...done
Starting Cinder...done
Configuring Heat...done
Starting Heat...done
Uploading images...done
Configuring Tempest...[/]
Heat has replaced the default flavors. View by running: nova flavor-list
Keystone is serving at
Examples on using novaclient command line is in
The default users are: admin and demo
The password: 123456
This is your host ip:
done completed in 368 seconds

I timed a heat stack-create operation at about 34 seconds.  In a typical day I may create 50 or more stacks, so the time really adds up.

Turbo-charged DevStack

After installing two SSD devices, I decided to use LVM raid 0 striping.  Linux Magazine indicates mdadm is faster, but I prefer a single management solution for my disks.

The hdparm tool shows some a beast 1GB/sec throughput on reads:

[sdake@bigiron ~]$ sudo hdparm -tT /dev/raid0_vg/ssd_opt

Timing cached reads: 21512 MB in 2.00 seconds = 10771.51 MB/sec
Timing buffered disk reads: 3050 MB in 3.00 seconds = 1016.47 MB/sec

I also ran the Gnome 3 disk benchmarking tool, this time in write mode.  It showed an average 930MB/sec read and 370MB/sec write throughput:


I ran in a little under 3 minutes:

[sdake@bigiron devstack]$ ./
Using mysql database backend
Installing package prerequisites...done
Installing OpenStack project source...done
Starting qpid...done
Configuring and starting MySQL...done
Starting Keystone...done
Configuring Glance...done
Configuring Nova...done
Configuring Cinder...done
Configuring Nova...done
Using libvirt virtualization driver...done
Starting Glance...done
Starting Nova API...done
Starting Nova...done
Starting Cinder...done
Configuring Heat...done
Starting Heat...done
Uploading images...done
Configuring Tempest...[|]
Heat has replaced the default flavors. View by running: nova flavor-list
Keystone is serving at
Examples on using novaclient command line is in
The default users are: admin and demo
The password: 123456
This is your host ip:
done completed in 166 seconds

I timed a heat stack create at 6 seconds.  Comapred to the non-ssd 34 seconds, RAID 0 SSDs rock!  Overall system seems much faster and benchmarking shows it.

The Heat API – A template based orchestration framework

Over the last year, Angus Salkeld and I have been developing a IAAS high availability service called Pacemaker Cloud.  We learned that the problem we were really solving was orchestration.  Another dev group was also looking at this problem inside Red Hat from the launching side.  We decided to take two weeks off from our existing work and see if we could join together to create a proof of concept implementation from scratch of AWS CloudFormation for OpenStack.  The result of that work was a proof of concept project which provided launching of a WordPress template, as had been done in our previous project.

The developers decided to take another couple weeks to determine if we could get a more functional system that would handle composite virtual machines.  Today, we released that version, our second iteration of  the Heat API.  Since we have many more developers, and a project that exceeded our previous functionality of Pacemaker Cloud, the Heat Development Community has decided to cease work on our previous orchestration projects and focus our efforts on Heat.

A bit about Heat:  The Heat API implements the AWS Cloud Formations API.  This API provides a rest interface for creating composite VMs called Stacks from template files.  The goal of the software is to be able to accurately launch AWS CloudFormation Stacks on OpenStack.  We will also enable good quality high availability based upon the technologies we created in Pacemaker Cloud including escalation.

Given that C was a poor choice of implementation language for making REST based cloud services, Heat is implemented in Python which is fantastic for REST services.  The Heat API also follows OpenStack design principles.  Our initial design after our POC shows the basics of our architecture and our quickstart guide can be used with our second iteration release.

mailing list is available for developer and user discussion.  We track milestones and issues using github’s issue tracker.  Things are moving fast – come join our project on github or chat with the devs on #heat on freenode!