OpenStack Virtual Baremetal on a Public Cloud

Background

At long last, OVB has reached the goal we set for it way back when it was more idea than reality. The plan was to come up with a system where we could test baremetal deployment against OpenStack instances in an arbitrary cloud. That would allow something like TripleO CI to scale, for all practical purposes, infinitely. For years this wasn't possible because OVB required some configuration and/or hacks that were not appropriate for production clouds, which limited it to use in specific CI-oriented clouds. Theoretically it has been possible to use OVB on normal clouds for a couple of years now, but in practice public clouds were either running OpenStack versions that were too old or didn't expose the necessary features. Fortunately, this is no longer true.

Enter Vexxhost (I should note that I have no affiliation with Vexxhost). Generally speaking, the OVB requirements are:

  • OpenStack Mitaka or newer
  • Heat
  • Access to the Neutron port-security extension
  • The ability to upload images

That may not be an exhaustive list, but it covers the things that frequently blocked me from running OVB on a cloud. Vexxhost ticks all the boxes.

Details

First, the OVB on Public Cloud demo video. As I mentioned in the video, there are some details that I glossed over in the interest of time (the video is nearly 20 minutes as it is). Here's a more complete explanation of those:

  • Quota. The default quota when you create an account on Vexxhost is somewhat restrictive for OVB use. You can fit an OVB environment into it, but you have to use the absolute minimum sizes for each instance. In some cases this may not work well. As I recall, the main restriction was CPU cores. To address this, just open a support ticket and explain what you're doing. I got my cpu core limit raised to 50, which enabled any nonha configuration I wanted. If you're doing an ha deployment you may need to have some other quota items raised, but you'll have to do the math on that.
  • Performance. Performance was a mixed bag, which is one of the downsides of running in a general purpose cloud. Some of my test runs were as fast as my dedicated local cloud, others were much slower.
  • Flavor disk sizes. As noted in the video, some of the Vexxhost flavors have a disk size of 0. Initially I thought this meant they could only be used with boot from volume, but fortunately it turns out you can still boot instances from ephemeral storage. This is important because on a public cloud you need to be able to nova rebuild the baremetal instances between deployments so they can PXE boot again. Nova can't currently rebuild instances booted from volume, so make sure to avoid doing that with OVB.

    Because of the 0 disk size, the virtual size of the images you boot will determine how much storage your instances get. By default, the CentOS 7 image only has 8 GB. That is not enough for a TripleO undercloud, although it's fine for the BMC. This can be worked around by either using qemu-img to resize a stock CentOS image and upload it to Glance, or by snapshotting an instance booted from a flavor that has a larger disk size. I did the latter because it allowed me to make some customizations (injecting ssh keys, setting a root password, etc.) to my base undercloud image. Either should work though.

    The original OVB ipxe-boot image has a similar problem. There is now an ipxe-boot-41 image with a virtual size of 41 GB that should provide enough storage for most deployment scenarios. If it doesn't, this image can also be resized with qemu-img.

    Another impact of this configuration is that the nodes.json created by build-nodes-json may have an incorrect disk size. It currently can't determine the size of a disk in an instance booted from a flavor with a 0 disk size. There are two options to deal with this. First, nodes.json can be hand-edited before importing to Ironic. Second, introspection can be run on node import, which is what I did in the video. It's a bit cleaner and demonstrates that introspection functionality works as expected. Introspection will look at the instance details from the inside so it can determine the correct disk size.

  • Cost. Obviously public cloud resources are not free. If you created the environment I used in my demo and left it running for a full month, at the current Vexxhost prices it would cost around $270 US. That's just for the instances, there will also be some small cost for image storage and bandwidth, although in my experience those would be negligible compared to the per-instance costs. That's not pocket change, but compared to the cost for purchasing and administering your own cloud it may be worth it.
  • [Edit 2018-5-7] Baremetal rebuilds. This is mentioned in the video, but it's important so I wanted to reiterate it here too. You must rebuild the baremetal instances between deployments in a public cloud environment. There is a rebuild-baremetal script provided that can automate this for you. In the case of the demo environment, it would have been called like this: rebuild-baremetal 2 baremetal demo. Note that this is run using the host cloud credentials, not the undercloud.