Isn’t it Atomic on OpenStack Ironic, don’t you think?

OpenStack Ironic is a bare metal as a service deployment tool.  Fedora Atomic is a µOS consisting of a very minimal installation of Linux, kernel.org, Kubernetes and Docker.  Kubernetes is an endpoint manager and container scheduler, while Docker is a container manager.  The basic premise of Fedora Atomic using Ironic is to present a lightweight launching mechanism for OpenStack.

The first step in launching Atomic is to make Ironic operational.  I used devstack for my deployment.  The Ironic developer documentation is actually quite good for a recently Integrated OpenStack project.  I followed the instructions for devstack.  I used pxe+ssh, rather then the agent+ssh.  The pxe+ssh driver virtualizes bare-metal deployment for testing purposes, so only one machine is needed.  The machine should have 16GB+ of RAM.  I find 16GB a bit tight, however.

I found it necessary to hack devstack a bit to get Ironic to operate.  The root cause of the issue is that libvirt can’t write the console log to the home directory as specified in the localrc. To solve the problem I just hacked devstack to write the log files to /tmp. I am sure there is a more elegant way to solve this problem.

The diff of my devstack hack is:

[sdake@bigiron devstack]$ git diff
diff --git a/tools/ironic/scripts/create-node b/tools/ironic/scripts/create-node
index 25b53d4..5ba88ce 100755
--- a/tools/ironic/scripts/create-node
+++ b/tools/ironic/scripts/create-node
@@ -54,7 +54,7 @@ if [ -f /etc/debian_version ]; then
fi
if [ -n "$LOGDIR" ] ; then
- VM_LOGGING="--console-log $LOGDIR/${NAME}_console.log"
+ VM_LOGGING="--console-log /tmp/${NAME}_console.log"
else
VM_LOGGING=""
fi

My devstack localrc contains:

SCREEN_LOGDIR=$DEST/logs
DATABASE_PASSWORD=123456
RABBIT_PASSWORD=123456
SERVICE_TOKEN=123456
SERVICE_PASSWORD=123456
ADMIN_PASSWORD=123456

disable_service horizon
disable_service rabbit
disable_service quantum
enable_service qpid
enable_service magnum

# Enable Ironic API and Ironic Conductor
enable_service ironic
enable_service ir-api
enable_service ir-cond

# Enable Neutron which is required by Ironic and disable nova-network.
disable_service n-net
enable_service q-svc
enable_service q-agt
enable_service q-dhcp
enable_service q-l3
enable_service q-meta
enable_service neutron

# Create 3 virtual machines to pose as Ironic's baremetal nodes.
IRONIC_VM_COUNT=5
IRONIC_VM_SSH_PORT=22
IRONIC_BAREMETAL_BASIC_OPS=True

# The parameters below represent the minimum possible values to create
# functional nodes.
IRONIC_VM_SPECS_RAM=1024
IRONIC_VM_SPECS_DISK=20

# Size of the ephemeral partition in GB. Use 0 for no ephemeral partition.
IRONIC_VM_EPHEMERAL_DISK=0
VIRT_DRIVER=ironic

# By default, DevStack creates a 10.0.0.0/24 network for instances.
# If this overlaps with the hosts network, you may adjust with the
# following.
NETWORK_GATEWAY=10.1.0.1
FIXED_RANGE=10.1.0.0/24
FIXED_NETWORK_SIZE=256

# Log all output to files
LOGFILE=$HOME/devstack.log
SCREEN_LOGDIR=$HOME/logs

It took me two days to sort out the project in this blog post, and during the process, I learned a whole lot about how Ironic operates by code inspection and debugging.  I couldn’t find much documentation about the deployment process so I thought I’d share a nugget of information about the deployment process:

  • Nova contacts Ironic to allocate an Ironic node providing the image to boot
  • Ironic pulls the image from glance and stores it on the local hard disk
  • Ironic boots a virtual machine via SSH with a PXE-enabled seabios BIOS
  • The seabios code asks Ironic’s tftpserver for a deploy ramdisk and kernel
  • The deployed node starts the deploy kernel and ramdisk
  • The deploy ramdisk does the following:
    1. Starts tgtd to present the root device as an iSCSI disk on the network
    2. Contacts the Ironic ReST API to initiate iSCSI transfer of the image
    3. Waits on port 10000 for a network connection to indicate the iSCSI transfer is complete
    4. Reboots the node once port 10000 has been opened and closed by a process
  • Once the deploy ramdisk contacts Ironic to initiate iSCSI transfer of the image Ironic does the following:
    1. uses iscsiadm to connect to the ISCSI target on the deploy hardware
    2. spawns several dd processes to copy the local disk image to the iSCSI target
    3. Once the dd processes exit successfully, Ironic contacts port 10000 on the deploy node
  • Ironic changes the PXEboot configuration to point to the user’s actual desired ramdisk and kernel
  • The deploy node reboots into SEABIOS again
  • The node boots the proper ramdisk and kernel, which load the disk image that was written via iSCSI

Fedora Atomic does not ship images that are suitable for use with the Ironic model.  Specifically what is needed is a LiveOS image, a ramdisk, and a kernel.  The LiveOS image that Fedora Cloud does ship is not the Atomic version.  Clearly it is early days for Atomic and I expect these requirements will be met as time passes.

But I wanted to deploy Atomic now on Ironic, so I sorted out making a PXE-bootable Atomic Live OS image.

First a bit about how the Atomic Cloud Image is structured:

[sdake@bigiron Downloads]$ guestfish

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
'man' to read the manual
'quit' to quit the shell

><fs> add-ro Fedora-Cloud-Atomic-20141203-21.x86_64.qcow2
><fs> run
><fs> list-filesystems
/dev/sda1: ext4
/dev/atomicos/root: xfs

The Atomic cloud image has /dev/sda1 containing the contents of the /boot directory.  The /dev/sda2 partition contains a LVM partition.  There is a logical volume called atomicos/root which contains the root filesystem.

Building the Fedora Atomic images for Ironic is as simple as extracting the ramdisk and kernel from /dev/sda1 and extracting /dev/sda2 into an image for Ironic to dd to the iSCSI target.  A bit complicating is that the fstab must have the /boot entry removed.  Determining how to do this was a bit of a challenge, but I wrote a script to automate the Ironic image generation process.

The first step is to test that Ironic actually installs via devstack using the above localrc:

[sdake@bigiron devstack]$ ./stack.sh
bunch of output from devstack ommitted
Keystone is serving at http://192.168.1.124:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: 123456
This is your host ip: 192.168.1.124

Next, take a look at the default image list which should look something like:

[sdake@bigiron devstack]$ source ./openrc admin admin
[sdake@bigiron devstack]$ glance image-list
+---------------------------------+-------------+------------------+-----------+
| Name                            | Disk Format | Container Format | Size      |
+---------------------------------+-------------+------------------+-----------+
| cirros-0.3.2-x86_64-disk        | qcow2       | bare             | 13167616  |
| cirros-0.3.2-x86_64-uec         | ami         | ami              | 25165824  |
| cirros-0.3.2-x86_64-uec-kernel  | aki         | aki              | 4969360   |
| cirros-0.3.2-x86_64-uec-ramdisk | ari         | ari              | 3723817   |
| Fedora-x86_64-20-20140618-sda   | qcow2       | bare             | 209649664 |
| ir-deploy-pxe_ssh.initramfs     | ari         | ari              | 95220206  |
| ir-deploy-pxe_ssh.kernel        | aki         | aki              | 5808960   |
+---------------------------------+-------------+------------------+-----------+

In this case, we want to boot the UEC image. Ironic expects properties attached to the image ramdisk_id and kernel_id which are the UUIDs of cirros-0.3.2-x86_64-uec-kernel and cirros-0.3.2-x86_64-uec-ramdisk.

Running image-show, we can see these properties:

[sdake@bigiron devstack]$ glance image-show cirros-0.3.2-x86_64-uec 
+-----------------------+--------------------------------------+
| Property              | Value                                |
+-----------------------+--------------------------------------+
| Property 'kernel_id'  | c11bd198-227f-4156-9195-40b16278b65c |
| Property 'ramdisk_id' | 5e6839ef-daeb-4a1c-be36-3906ed4d7bd7 |
| checksum              | 4eada48c2843d2a262c814ddc92ecf2c     |
| container_format      | ami                                  |
| created_at            | 2014-12-09T14:56:05                  |
| deleted               | False                                |
| disk_format           | ami                                  |
| id                    | 259ca231-66ad-439d-900b-3dc9e9408a0c |
| is_public             | True                                 |
| min_disk              | 0                                    |
| min_ram               | 0                                    |
| name                  | cirros-0.3.2-x86_64-uec              |
| owner                 | 4b798efdcd5142509fe87b12d89d5949     |
| protected             | False                                |
| size                  | 25165824                             |
| status                | active                               |
| updated_at            | 2014-12-09T14:56:06                  |
+-----------------------+--------------------------------------+

Now that we have validated the cirros image is available, the next step is to launch one from the demo user:

[sdake@bigiron devstack]$ source ./openrc demo demo
[sdake@bigiron devstack]$ nova keypair-add --pub-key ~/.ssh/id_rsa.pub steak
[sdake@bigiron devstack]$ nova boot --flavor baremetal --image cirros-0.3.2-x86_64-uec --key-name steak cirros_on_ironic
[sdake@bigiron devstack]$ nova list
+--------------------------------------+------------------+--------+------------+-------------+------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks         |
+--------------------------------------+------------------+--------+------------+-------------+------------------+
| 9e64804d-264d-40d2-88f4-e858efe69557 | cirros_on_ironic | ACTIVE | -          | Running     | private=10.1.0.4 |
+--------------------------------------+------------------+--------+------------+-------------+------------------+
[sdake@bigiron devstack]$ ssh cirros@10.1.0.4
$ uname -a
Linux cirros-on-ironic 3.2.0-60-virtual #91-Ubuntu SMP Wed Feb 19 04:13:28 UTC 2014 x86_64 GNU/Linux

If this part works, that means you have a working Ironic devstack setup. The next step is to get the Atomic images and convert them for use with Ironic.

[sdake@bigiron fedora-atomic-to-liveos-pxe]$ ./convert.sh
Mounting boot and root filesystems.
Done mounting boot and root filesystems.
Removing boot from /etc/fstab.
Done removing boot from /etc/fstab.
Extracting kernel to fedora-atomic-kernel
Extracting ramdisk to fedora-atomic-ramdisk
Unmounting boot and root.
Creating a RAW image from QCOW2 image.
Extracting base image to fedora-atomic-base.
cut: invalid byte, character or field list
Try 'cut --help' for more information.
sfdisk: Disk fedora-atomic.raw: cannot get geometry
sfdisk: Disk fedora-atomic.raw: cannot get geometry
12171264+0 records in
12171264+0 records out
6231687168 bytes (6.2 GB) copied, 29.3357 s, 212 MB/s
Removing raw file.

The sfdisk: cannot get geometry warnings can be ignored.

After completion you should have fedora-atomic-kernel, fedora-atomic-ramdisk, and fedora-atomic-base files. Next we register these with glance:

[sdake@bigiron fedora-atomic-to-liveos-pxe]$ ls -l fedora-*
-rw-rw-r-- 1 sdake sdake 6231687168 Dec  9 08:59 fedora-atomic-base
-rwxr-xr-x 1 root  root     5751144 Dec  9 08:59 fedora-atomic-kernel
-rw-r--r-- 1 root  root    27320079 Dec  9 08:59 fedora-atomic-ramdisk
[sdake@bigiron fedora-atomic-to-liveos-pxe]$ glance image-create --name=fedora-atomic-kernel --container-format aki --disk-format aki --is-public True --file fedora-atomic-kernel
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | 220c2e9d97c3f775effd2190199aa457     |
| container_format | aki                                  |
| created_at       | 2014-12-09T16:47:12                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | aki                                  |
| id               | b8e08b02-5eac-467d-80e1-6c8138d0bf57 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | fedora-atomic-kernel                 |
| owner            | a28b73a4f29044f184b854ffb7532ceb     |
| protected        | False                                |
| size             | 5751144                              |
| status           | active                               |
| updated_at       | 2014-12-09T16:47:12                  |
| virtual_size     | None                                 |
+------------------+--------------------------------------+
[sdake@bigiron fedora-atomic-to-liveos-pxe]$ glance image-create --name=fedora-atomic-ramdisk --container-format ari --is-public True --disk-format ari --file fedora-atomic-ramdisk
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | 9ed72ddc0411e2f30d5bbe6b5c2c4047     |
| container_format | ari                                  |
| created_at       | 2014-12-09T16:48:31                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | ari                                  |
| id               | a62f6f32-ed66-4b18-8625-52d7262523f6 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | fedora-atomic-ramdisk                |
| owner            | a28b73a4f29044f184b854ffb7532ceb     |
| protected        | False                                |
| size             | 27320079                             |
| status           | active                               |
| updated_at       | 2014-12-09T16:48:31                  |
| virtual_size     | None                                 |
+------------------+--------------------------------------+
[sdake@bigiron fedora-atomic-to-liveos-pxe]$ glance image-create --name=fedora-atomic --container-format ami --disk-format ami --is-public True --property ramdisk_id=b2f60f33-9c8e-4905-a64b-90997d3dcb92 --property kernel_id=0e687b76-31d0-4351-a92a-a2d348482d42 --file fedora-atomic-base
+-----------------------+--------------------------------------+
| Property              | Value                                |
+-----------------------+--------------------------------------+
| Property 'kernel_id'  | 0e687b76-31d0-4351-a92a-a2d348482d42 |
| Property 'ramdisk_id' | b2f60f33-9c8e-4905-a64b-90997d3dcb92 |
| checksum              | 6a25f8bf17a94a6682d73b7de0a13013     |
| container_format      | ami                                  |
| created_at            | 2014-12-09T16:52:45                  |
| deleted               | False                                |
| deleted_at            | None                                 |
| disk_format           | ami                                  |
| id                    | d4ec78d7-445a-473d-9b7d-a1a6408aeed2 |
| is_public             | True                                 |
| min_disk              | 0                                    |
| min_ram               | 0                                    |
| name                  | fedora-atomic                        |
| owner                 | a28b73a4f29044f184b854ffb7532ceb     |
| protected             | False                                |
| size                  | 6231687168                           |
| status                | active                               |
| updated_at            | 2014-12-09T16:53:16                  |
| virtual_size          | None                                 |
+-----------------------+--------------------------------------+

Next we configure Ironic’s PXE boot config options and restart the ironic conductor in devstack. To restart Ironic conductor use screen -r, find the appropriate conductor screen, press CTRL-C, up arrow, ENTER. This will reload the configuration.

/etc/ironic/ironic.conf should be changed to have this config option:

pxe_append_params = nofb nomodeset vga=normal console=ttyS0 no_timer_check rd.lvm.lv=atomicos/root root=/dev/mapper/atomicos-root ostree=/ostree/boot.0/fedora-atomic/a002a2c2e44240db614e09e82c7822322253bfcaad0226f3ff9befb9f96d315f/0

Next we launch the fedora-atomic image using Nova’s baremetal flavor:

[sdake@bigiron ~]$ source /home/sdake/repos/devstack/openrc demo demo
[sdake@bigiron Downloads]$ nova boot --flavor baremetal --image fedora-atomic --key-name steak fedora_atomic_on_ironic
[sdake@bigiron Downloads]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+----------+
| ID                                   | Name                    | Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------+--------+------------+-------------+----------+
| e7f56931-307d-45a7-a232-c2fa70898cae | fedora-atomic_on_ironic | BUILD  | spawning   | NOSTATE     |          |
+--------------------------------------+-------------------------+--------+------------+-------------+----------+

Finally login to the Atomic Host:

[sdake@bigiron ironic]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks         |
+--------------------------------------+-------------------------+--------+------------+-------------+------------------+
| d061c0ef-f8b7-4fff-845b-8272a7654f70 | fedora-atomic_on_ironic | ACTIVE | -          | Running     | private=10.1.0.5 |
+--------------------------------------+-------------------------+--------+------------+-------------+------------------+
[sdake@bigiron ironic]$ ssh fedora@10.1.0.5
[fedora@fedora-atomic-on-ironic ~]$ uname -a
Linux fedora-atomic-on-ironic.novalocal 3.17.4-301.fc21.x86_64 #1 
SMP Thu Nov 27 19:09:10 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I found determining how to create the images from the Fedora Atomic Cloud images a bit tedious. The diskimage builder tool would likely make this easier, if it supported RPM-ostree and Atomic.

Ironic needs some work to allow the pxe options to override the “root” initrd parameter. Ideally a glance image property would be allowed to be specified to override and extend the boot options. I’ve filed an Ironic blueprint for such an improvement.

Turbocharge DevStack with Raid 0 SSDs

Turbocharging DevStack

I wanted to turbocharge my development cycle of OpenStack running on Fedora 18 so I could be waiting on my brain rather then waiting on my workstation.  I decided to purchase two modern solid state drives (SSD) and run them in RAID 0.  I chose two Intel S3500 160 GB Enterprise grade SSDs to run in RAID 0.  My second choice was the Samsung 840 Pro which may have been a bit faster, but perhaps not as reliable.

Since OpenStack and DevStack mostly use /var and /opt for their work, I decided to replace only /var and /opt.  If a SSD fails, I am less likely to lose my home directory which may contain some work in progress because of the lower availability of RAID 0.

The Baseline HP Z820

For a baseline my system is a Hewlett Packard Z820 workstation (model #B2C08UT#ABA) that I purchased from Provantage in January 2013.  Most of the computer is a beast sporting an 8 core Intel Xeon 35-2670 @ 2.60GHZ running with Hyperthreading for 16 total cpus, Intel C602 chipset,  and 16 GB Quad Channel DDR3 ECC Unbuffered RAM.

The memory is fast as shown with ramspeed:

[sdake@bigiron ramspeed-2.6.0]$ ./ramspeed -b 3 -m 4096
RAMspeed (Linux) v2.6.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09

8Gb per pass mode

INTEGER   Copy:      11549.61 MB/s
INTEGER   Scale:     11550.59 MB/s
INTEGER   Add:       11885.79 MB/s
INTEGER   Triad:     11834.27 MB/s
---
INTEGER   AVERAGE:   11705.06 MB/s

Unfortunately the disk is a pokey 1TB 7200 RPM model.  The hdparm tool shows a pokey 118MB/sec.

[sdake@bigiron ~]$ sudo hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 20590 MB in 2.00 seconds = 10308.76 MB/sec
Timing buffered disk reads: 358 MB in 3.02 seconds = 118.69 MB/sec

Using the Gnome 3 Disk Image Benchmarking tool show a lower average of 82MB per second, although this is also passing through the LVM driver:

bench-disk

Warning: I didn’t run this benchmark with write enabled, as it would have destroyed the data on my disk.

Running stack.sh takes 6 minutes:

[sdake@bigiron devstack]$ ./stack.sh
Using mysql database backend
Installing package prerequisites...[|[/]^C[sdake@bigiron devstack]$ 
[sdake@bigiron devstack]$ ./stack.sh
Using mysql database backend
Installing package prerequisites...done
Installing OpenStack project source...done
Starting qpid...done
Configuring and starting MySQL...done
Starting Keystone...done
Configuring Glance...done
Configuring Nova...done
Configuring Cinder...done
Configuring Nova...done
Using libvirt virtualization driver...done
Starting Glance...done
Starting Nova API...done
Starting Nova...done
Starting Cinder...done
Configuring Heat...done
Starting Heat...done
Uploading images...done
Configuring Tempest...[/]
Heat has replaced the default flavors. View by running: nova flavor-list
Keystone is serving at http://192.168.1.20:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: 123456
This is your host ip: 192.168.1.20
done
stack.sh completed in 368 seconds

I timed a heat stack-create operation at about 34 seconds.  In a typical day I may create 50 or more stacks, so the time really adds up.

Turbo-charged DevStack

After installing two SSD devices, I decided to use LVM raid 0 striping.  Linux Magazine indicates mdadm is faster, but I prefer a single management solution for my disks.

The hdparm tool shows some a beast 1GB/sec throughput on reads:

[sdake@bigiron ~]$ sudo hdparm -tT /dev/raid0_vg/ssd_opt

/dev/raid0_vg/ssd_opt:
Timing cached reads: 21512 MB in 2.00 seconds = 10771.51 MB/sec
Timing buffered disk reads: 3050 MB in 3.00 seconds = 1016.47 MB/sec

I also ran the Gnome 3 disk benchmarking tool, this time in write mode.  It showed an average 930MB/sec read and 370MB/sec write throughput:

pic2

I ran stack.sh in a little under 3 minutes:

[sdake@bigiron devstack]$ ./stack.sh
Using mysql database backend
Installing package prerequisites...done
Installing OpenStack project source...done
Starting qpid...done
Configuring and starting MySQL...done
Starting Keystone...done
Configuring Glance...done
Configuring Nova...done
Configuring Cinder...done
Configuring Nova...done
Using libvirt virtualization driver...done
Starting Glance...done
Starting Nova API...done
Starting Nova...done
Starting Cinder...done
Configuring Heat...done
Starting Heat...done
Uploading images...done
Configuring Tempest...[|]
Heat has replaced the default flavors. View by running: nova flavor-list
Keystone is serving at http://192.168.1.20:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: 123456
This is your host ip: 192.168.1.20
done
stack.sh completed in 166 seconds

I timed a heat stack create at 6 seconds.  Comapred to the non-ssd 34 seconds, RAID 0 SSDs rock!  Overall system seems much faster and benchmarking shows it.

How we use CloudInit in OpenStack Heat

Many people over the past year have asked me how exactly to use CloudInit while the Heat developers have implemented OpenStack Heat.  Since CloudInit is the default virtual machine bootstrapping system on Debian, Fedora, Red Hat Enterprise Linux, Ubuntu and likely more distros, we decided to start with CloudInit as our base bootstrapping system.  I’ll present a code walk-through of how we use CloudInit inside OpenStack Heat.

Reading the CloudInit documentation is helpful, but it lacks programming examples of how to develop software to inject data into virtual machines using CloudInit.   The OpenStack Heat project implements injection in Python for CloudInit-enabled virtual machines.  Injection occurs by passing information to the virtual machine that is decoded by CloudInit.

IaaS paltforms require a method for users to pass data into the virtual machine.  OpenStack provides a metadata server which is co-located with the rest of the OpenStack infrastructure   When the virtual machine is booted, it can then make a HTTP request to a specific URI and return the user data passed to the instance during instance creation.

CloudInit’s job is to contact the metadata server and bootstrap the virtual machine with desired configurations.  In OpenStack Heat, we do this with three specific files.

The first file is our CloudInit configuration file:

runcmd:
- setenforce 0 > /dev/null 2>&1 || true
user: ec2-user

cloud_config_modules:
- locale
- set_hostname
- ssh
- timezone
- update_etc_hosts
- update_hostname
- runcmd

# Capture all subprocess output into a logfile
# Useful for troubleshooting cloud-init issues
output: {all: '| tee -a /var/log/cloud-init-output.log'}

This file directs CloudInit to turn off SELinux, install ssh keys for the user ec2-user, setup the locale, hostname, ssh, timezone, modify /etc/hosts with correct information and output the results of all cloud-init data to /var/log/cloud-init-output.log

There are many cloud config modules which provide different functionality.  Unfortunately they are not well documented, so the source must be read to understand their behavior.  For a list of cloud config modules, check the upstream repo.

Another file required by OpenStack Heat’s support for CloudInit is a part handler:

#part-handler
import os
import datetime
def list_types():
return(["text/x-cfninitdata"])

def handle_part(data, ctype, filename, payload):
if ctype == "__begin__":
try:
os.makedirs('/var/lib/heat-cfntools', 0700)
except OSError as e:
if e.errno != errno.EEXIST:
raise
return

if ctype == "__end__":
return

with open('/var/log/part-handler.log', 'a') as log:
timestamp = datetime.datetime.now()
log.write('%s filename:%s, ctype:%s\n' % (timestamp, filename, ctype))

if ctype == 'text/x-cfninitdata':
with open('/var/lib/heat-cfntools/%s' % filename, 'w') as f:
f.write(payload)

The part-handler.py file is executed by CloudInit to separate the UserData provided by the MetaData server in OpenStack.  CloudInit executes handle_part() for each part of a multi-part mime message which CloudInit doesn’t know how to decode.  This is how OpenStack Heat passes unique information for each virtual machine to assist in the orchestration process.  The first ctype is always set to __begin__. which triggers handle_part() to create the directory /var/lib/heat-cfntools.

The OpenStack Heat instance launch code uses the mime type of x-cfninitdata  In OpenStack Heat.  OpenStack Heat passes several files via this mime subtype each of which is decoded and stored in /var/lib/heat-cfntools.

The final file required is a script which runs at first boot:

#!/usr/bin/env python

path = '/var/lib/heat-cfntools'

def chk_ci_version():
    v = LooseVersion(pkg_resources.get_distribution('cloud-init').version)
    return v >= LooseVersion('0.6.0')

def create_log(path):
    fd = os.open(path, os.O_WRONLY | os.O_CREAT, 0600)
    return os.fdopen(fd, 'w')

def call(args, log):
    log.write('%s\n' % ' '.join(args))
    log.flush()
    p = subprocess.Popen(args, stdout=log, stderr=log)
    p.wait()
    return p.returncode

def main(log):

    if not chk_ci_version():
        # pre 0.6.0 - user data executed via cloudinit, not this helper
        log.write('Unable to log provisioning, need a newer version of'
                  ' cloud-init\n')
        return -1

    userdata_path = os.path.join(path, 'cfn-userdata')
    os.chmod(userdata_path, 0700)

    log.write('Provision began: %s\n' % datetime.datetime.now())
    log.flush()
    returncode = call([userdata_path], log)
    log.write('Provision done: %s\n' % datetime.datetime.now())
    if returncode:
        return returncode

if __name__ == '__main__':
    with create_log('/var/log/heat-provision.log') as log:
        returncode = main(log)
        if returncode:
            log.write('Provision failed')
            sys.exit(returncode)

    userdata_path = os.path.join(path, 'provision-finished')
    with create_log(userdata_path) as log:
        log.write('%s\n' % datetime.datetime.now())

This script logs the output of the execution of /var/lib/heat-cfn/cfnuserdata.

These files are co-located with OpenStack Heat’s engine process which loads these files and combines them plus other OpenStack Heat specific configuration blobs into one multipart mime message.

OpenStack Heat’s UserData generator:

    def _build_userdata(self, userdata):
        if not self.mime_string:
            # Build mime multipart data blob for cloudinit userdata

            def make_subpart(content, filename, subtype=None):
                if subtype is None:
                    subtype = os.path.splitext(filename)[0]
                msg = MIMEText(content, _subtype=subtype)
                msg.add_header('Content-Disposition', 'attachment',
                               filename=filename)
                return msg

            def read_cloudinit_file(fn):
                return pkgutil.get_data('heat', 'cloudinit/%s' % fn)

            attachments = [(read_cloudinit_file('config'), 'cloud-config'),
                           (read_cloudinit_file('part-handler.py'),
                            'part-handler.py'),
                           (userdata, 'cfn-userdata', 'x-cfninitdata'),
                           (read_cloudinit_file('loguserdata.py'),
                            'loguserdata.py', 'x-shellscript')]

            if 'Metadata' in self.t:
                attachments.append((json.dumps(self.metadata),
                                    'cfn-init-data', 'x-cfninitdata'))

            attachments.append((cfg.CONF.heat_watch_server_url,
                                'cfn-watch-server', 'x-cfninitdata'))

            attachments.append((cfg.CONF.heat_metadata_server_url,
                                'cfn-metadata-server', 'x-cfninitdata'))

            # Create a boto config which the cfntools on the host use to know
            # where the cfn and cw API's are to be accessed
            cfn_url = urlparse(cfg.CONF.heat_metadata_server_url)
            cw_url = urlparse(cfg.CONF.heat_watch_server_url)
            is_secure = cfg.CONF.instance_connection_is_secure
            vcerts = cfg.CONF.instance_connection_https_validate_certificates
            boto_cfg = "\n".join(["[Boto]",
                                  "debug = 0",
                                  "is_secure = %s" % is_secure,
                                  "https_validate_certificates = %s" % vcerts,
                                  "cfn_region_name = heat",
                                  "cfn_region_endpoint = %s" %
                                  cfn_url.hostname,
                                  "cloudwatch_region_name = heat",
                                  "cloudwatch_region_endpoint = %s" %
                                  cw_url.hostname])

            attachments.append((boto_cfg,
                                'cfn-boto-cfg', 'x-cfninitdata'))

            subparts = [make_subpart(*args) for args in attachments]
            mime_blob = MIMEMultipart(_subparts=subparts)

            self.mime_string = mime_blob.as_string()

        return self.mime_string

This code provides two functions:

  • make_subpart: Takes a list of attachments and creates mime subparts out of them
  • read_cloudinit_file: Reads OpenStack Heat’s CloudInit three files above

The rest of the function generates the UserData OpenStack Heat needs based upon the attachments list.  These attachments are then turned into a mime message which is passed to the instance creation:

       server_userdata = self._build_userdata(userdata)
        server = None
        try:
            server = self.nova().servers.create(
                name=self.physical_resource_name(),
                image=image_id,
                flavor=flavor_id,
                key_name=key_name,
                security_groups=security_groups,
                userdata=server_userdata,
                meta=tags,
                scheduler_hints=scheduler_hints,
                nics=nics,
                availability_zone=availability_zone)
        finally:
            # Avoid a race condition where the thread could be cancelled
            # before the ID is stored
            if server is not None:
                self.resource_id_set(server.id)

This snippet of code creates the UserData and passes it to the nova server create operation.

The flow is then:

  1. Create user data
  2. Heat creates nova server instance with user data
  3. Nova creates the instance
  4. CloudInit distro initialization occurs
  5. CloudInit reads config from OpenStack metadata server UserData information
  6. CloudInit executes part-handler.py with __start__
  7. CloudInit executes part-handler.py for each x-cfninitdata mime type
  8. part-handler.py writes the contents of each x-cfninitdata mime subpart to /var/lib/heat-cfntools on the instance
  9. CloudInit executes part-handler.py with __end__
  10. CloudInit executes the configuration operations defined by the config file
  11. CloudInit runs the x-shellscript blob which in this case is loguserdata.py
  12. loguserdata.py logs the output of  /var/lib/heat-cfn/cfnuserdata which is the initialization script set in the OpenStack Heat templates

This code walk-through will help developers understand how OpenStack Heat integrates with CloudInit and provide a better understanding of how to use CloudInit in your own Python applications if you roll your own bootstrapping process.

The Heat API – A template based orchestration framework

Over the last year, Angus Salkeld and I have been developing a IAAS high availability service called Pacemaker Cloud.  We learned that the problem we were really solving was orchestration.  Another dev group was also looking at this problem inside Red Hat from the launching side.  We decided to take two weeks off from our existing work and see if we could join together to create a proof of concept implementation from scratch of AWS CloudFormation for OpenStack.  The result of that work was a proof of concept project which provided launching of a WordPress template, as had been done in our previous project.

The developers decided to take another couple weeks to determine if we could get a more functional system that would handle composite virtual machines.  Today, we released that version, our second iteration of  the Heat API.  Since we have many more developers, and a project that exceeded our previous functionality of Pacemaker Cloud, the Heat Development Community has decided to cease work on our previous orchestration projects and focus our efforts on Heat.

A bit about Heat:  The Heat API implements the AWS Cloud Formations API.  This API provides a rest interface for creating composite VMs called Stacks from template files.  The goal of the software is to be able to accurately launch AWS CloudFormation Stacks on OpenStack.  We will also enable good quality high availability based upon the technologies we created in Pacemaker Cloud including escalation.

Given that C was a poor choice of implementation language for making REST based cloud services, Heat is implemented in Python which is fantastic for REST services.  The Heat API also follows OpenStack design principles.  Our initial design after our POC shows the basics of our architecture and our quickstart guide can be used with our second iteration release.

mailing list is available for developer and user discussion.  We track milestones and issues using github’s issue tracker.  Things are moving fast – come join our project on github or chat with the devs on #heat on freenode!

Corosync 2.0.0 released!

A few short weeks after Corosync 1.0.0 was released, the developers huddled for our future planning of Corosync 2.0.0.  The major focus of that meeting was “Corosync as implemented is too complicated”.  We had threads, semaphores, mutexes, an entire protocol, plugins, a bunch of unused services, a backwards compatability layer, multiple cryptographic engines.

Going for us, we did have a world class group communication system implementation (if not a little complicated) developed by a large community of developers, battle hardened by thousands of field deployments, tested by tens of thousands of community members.

As a result of that meeting, we decided to keep the good and throw out the bad, as we did between the openais and corosync transitions.  Gone are threads.  Gone are compatibility layers.  Gone are plugins.  Gone are unsupported encryption engines.  Gone are a bunch of other user-invisible junk that was crudding up the code base.

Shortly after Corosync 2.0.0 development was started, Angus Salkeld had the great idea of taking the infrastructure in corosync (IPC, Logging, Timers, Poll loop, shared memory, etc) and putting that into a new project called libqb.  The objective of this work was obvious:  To create a world-class infrastructure library specifically focused on the needs of cluster developers with a great built-in make-check test suite.

This helped us reach even closer to our goals of simplification.  As we pushed the infrastructure out of base Corosync, we could focus more on protocols/APIs.  You would be surprised to find that implementing the infrastructure took about as much effort as the rest of the system (APIs and Totem).

All of this herculean effort wouldn’t be possible without our developer and user community.  I’d especially like to acknowledge Jan Friesse in his leadership role of helping to coordinate the upstream release process and drive the upstream feature set to 2.0.0 resolution.  Angus Salkeld was invaluable in his huge libqb effort which occurred on time and with great quality.  Finally I want to thank Fabio Di Nitto for beating various parts of the Corosync code base into submission and his special role in designing the votequorum API.  There are many other contributors including developers and tested who I won’t mention individually, but I’d also like to thank for their improvements to the code base.

Great job devs!!  Now its up to the users of Corosync to tell us if we delivered on our objective we set out with 18 months ago – making Corosync 2.0 faster, simpler, smaller, and most importantly higher quality.

The software can be downloaded from Corosync’s Website.  Corosync 2.0, as well as the rest of the improved community developed cluster stack will show up in distros as they refresh their stacks.

Announcing Pacemaker Cloud 0.6.0 release

I am super pleased to announce the release of Pacemaker Cloud 0.6.0.

Pádraig Brady will be providing a live demonstration of Pacemaker Cloud integrated with OpenStack at FOSDEM.

 

What is pacemaker cloud?

Pacemaker Cloud is a high scale high availability system for virtual machine and cloud environments.  Pacemaker Cloud uses the techniques of fault detection, fault isolation, recovery and notification to provide a full high availability solution tailored to cloud environments.

Pacemaker Cloud combines multiple virtual machines (called assemblies) into one application group (called a deployable).  The deployable is then managed to maintain an active and running state in in the face of failures.  Recovery escalation is used to recover from repetitive failures and drive the deployable to a known good working state.

New in this release:

  • OpenStack integration
  • Division of supported infrastructures into separate packaging
  • Ubuntu assembly support
  • WordPress vm + MySql deployable demonstration
  • Significantly more readable event notification output
  • Add ssh keys to generated assemblies
  • Recovery escalation
  • Bug fixes
  • Performance enhancements

Where to get the software:

The software is available for download on the project’s website.

Adding second monitoring method to Pacemaker Cloud – sshd

Recently Angus Salkeld and I have decided to start working on a second approach to Pacemaker Cloud monitoring. Today we monitor with Matahari. We would also like the ability to monitor with OpenSSH’s sshd.  With this model, sshd becomes a second monitoring agent in addition to Matahari.  Since sshd is everywhere, and everyone is comfortable with the SSH security model, we believe this makes a superb alternative monitoring solution.

To help kick that work off, I’ve started a new branch in our git repository where this code will be located called topic-ssh.

To summarize the work, we are taking the dped binary and making a second libssh2 specific binary based on the work of the dped. We will also integrate directly with libdeltacloud as part of this work. The output of this topic will be the major work in the 0.7.0 release.

We looked at python as the language for dped, but testing showed that not to be particularly feasible without drastically complicating our operating model. With our model of running thousands of dpe processes on one system, one dpe per deployable, we would need python to have a small footprint. Testing showed that python consumes 15 times as much memory per dpe instance vs a comparable C binary.

We think there are many opportunities for people without a strong C skillset, but with a strong python skillset to contribute tremendously to the project in the CPE component. We plan to rework the CPE process into a python implementation.

If you want to get involved in the project today, working on the CPE C++ to python rework would be a great place to start!

Release schedule for Corosync Needle (2.0)

Over the last 18 months, the Corosync development community has been hard at work making Corosync Needle (version 2.0.0) a reality.  This release offers an evolutionary step in Corosync by adding several community requested features, removing the troubling threads and plugins, and tidying up the quorum code base.

I would like to point out the dilligent work of Jan Friesse (Honza) for tackling the 15 or so feature backlog items on our feature list.  Angus Salkeld has taken the architectural step of moving the infrastructure (ipc, logging, and other infrastructure components) of Corosync into a separate project (http://www.libqb.org).  Finally I’d like to point out the excellent work of Fabio Di Nitto and his cabal for tackling the quorum code base to make it truly usable for bare metal clusters.

The release schedule is as follows:

Alpha		January 17, 2012	version 1.99.0
Beta		January 31, 2012	version 1.99.1
RC1		February 7, 2012	version 1.99.2
RC2		February 14, 2012	version 1.99.3
RC3		February 20, 2012	version 1.99.4
RC4		February 27, 2012	version 1.99.5
RC5		March 6, 2012		version 1.99.6
Release 2.0.0	March 13, 2012		version 2.0.0