Tokyo Summit - Upgrades and Containers

Another summit has come and gone, and I survived two trans-Pacific flights! A mere 16 hours of sleep when I got home is all it took to (mostly) get over the jet lag. :-)

But this isn't a travel blog, so you probably don't care about that. You want to know what's going on with OpenStack, at least from my perspective. If you don't care about that either, the back button is ^thataway^. Or maybe \/thataway\/ if you're on a mobile device.

For TripleO, almost everything I talked about in my Vancouver Wrap-Up has been accomplished. TripleO CI is now fully instack-based, taking better and better advantage of the Puppet community's work, and has some initial container implementation work done. So we're done, right? Check off OpenStack deployment on the list of solved problems?

Not quite. :-)

Most of our work in the past cycle focused on getting upstream TripleO in a state where we could really start focusing on the tough problems. Those were the main topics of conversation in Tokyo, and I think it's fantastic that we're in a place where the basic framework of TripleO enables us to stop working almost exclusively on low-level plumbing (because what we have works) and start giving more consideration to usability (because what we have is still too hard to use). Which is not to say that making things more usable won't require re-plumbing some of the underlying bits (more on that later), but a lot of improvements can be made at a higher level without many/any deep architectural changes.

Upgrades

This was a huge topic in general at the summit, and particularly important to TripleO, since being able to upgrade a cloud in-place is a critical feature. Nobody wants to stand up a second cloud to migrate to a new version of OpenStack.

Step 0 for TripleO to claim full support for in-place upgrades is a CI job exercising that functionality. Up to now, we had been trying to avoid breaking upgrades, but didn't have any real testing in place to verify we hadn't. As Dan Smith noted in one of his talks about upgrades in Nova, if you're not testing this stuff you're going to break it at some point.

Now that we have stable branches for the TripleO projects, we can implement a CI test that does upgrades, hopefully something similar to the Grenade job that runs for devstack. To me, this is the number 1 short-term priority for TripleO because the further away we get from the Liberty branches, the more likely we are to unintentionally break something, and that will just further complicate the process of setting up an upgrade job. Fortunately we have some major contributors to TripleO focused on getting this going, so we should be in pretty good shape as long as the rest of us don't approve anything we shouldn't in the meantime.

Containers

In the long run, this will be a huge part of our upgrade strategy, although unfortunately it doesn't help us much in the short-term since we still need to be able to upgrade non-containerized services too. While I admit to being skeptical of some of the claims made by proponents of containers, there's no question that they would be immensely helpful for implementing live upgrades in TripleO. Without containerized services, you run the risk of dependency conflicts if, for example, you've upgraded Keystone but not yet Nova. This may be fine as long as you can leave the Nova services running, but the second you need to restart one of them (perhaps to pin an RPC version) you have a problem if Keystone pulled in a newer version of a dependency that isn't compatible with Nova.

Containers allow you to upgrade one service at a time without affecting the other services, which from what I heard in the operator sessions is something they want to be able to do (and are doing today in some cases). It mitigates the risk of upgrades (changing one piece of a cloud at a time instead of the whole thing at once can help prevent major outages) and in some cases an operator may want to run a mismatched version of a service intentionally. Horizon seemed to be a popular example of this.

Fortunately, TripleO, along with the Kolla team, are making progress in this area already. We have an implementation of containerized compute already, and there is a team working on getting it integrated into TripleO CI so it can be a first-class member of the ecosystem. There's plenty more to do, but at this point I don't see any unsolvable blockers.

GUI

As a long-time command-line user, my first inclination is to scoff at GUIs. After all, we provide you this nice CLI, so what would you want a GUI for?

Well it turns out GUIs are pretty, and pretty things demo well. That, if for no other reason (and there are plenty of other reasons too, of course), is sufficient reason to have a good GUI. This also goes back to my earlier comment about making TripleO easier for ordinary human beings without a deep knowledge of multiple deployment tools to use. True, we had a GUI called tuskar-ui, but for a number of reasons the tuskar and tuskar-ui projects fell behind the functionality of the base tripleoclient CLI, and when we took a step back to look at why we discovered a big hole in TripleO: it has no REST API. Which brings us to the next topic...

TripleO API

To understand why TripleO has historically not had a well-defined API, we need to revisit some of the original design tenets of the program. At some time, the plan was for the "TripleO API" to simply be the OpenStack APIs that it uses to do deployment. The problem with this philosophy that we have since discovered is that there is a fair amount of business logic wrapped around those other APIs. Without a formally defined deployment API, each user interface for TripleO (CLI, GUI, and some downstream things that I'm aware of but pretend not to be) has to reimplement that wrapper logic, and of course these tools are all written in different languages (Python, JavaScript, etc.).

So when we wrote the CLI, in a big hurry due to some unfortunate business realities (pro-tip: don't cross the business and technical streams if you can possibly help it, which you probably can't...), all of that business logic went directly into the CLI and was thus completely unusable for any GUI, or even other CLIs for that matter.

Our planned solution to this bad situation is to pull all of that business logic out of the CLI and drop it behind an API that can be used by anyone. Of course, this is not as simple as it sounds, but we have a plan for moving forward and had a lot of good discussion about this topic in Tokyo.

Split Hardware and Software Stacks

Another, more ephemeral, topic of discussion was the possibility of moving away from a single Heat stack for deploying with TripleO. In this model, you would have a hardware Heat stack that is responsible for deploying the baremetal machines hosting your cloud, and then a second software Heat stack responsible for configuring OpenStack on that hardware.

There are a few benefits to this. First, we get to move from one giant, monolithic Heat stack that is very hard to understand for new users to two smaller, more focused stacks that would hopefully be easier to learn.

Another big advantage is the possibility of having a more pluggable deployment method. Right now puppet is pretty deeply ingrained in our Heat templates, which presents a big hurdle to deployers who would prefer to use Ansible, Chef, Salt, bash scripts, or any other deployment method you can imagine. With this split stack strategy, it would theoretically be possible to deploy your hardware stack with TripleO, then do the OpenStack software deployment with your tool of choice. This should be good for everyone involved - TripleO has the opportunity to attract users it otherwise couldn't, existing deployment tools that lack a baremetal deployment method get exactly that, and OpenStack continues to benefit from the close feedback cycle with users/deployers (since we function as both) that TripleO provides.

Now, a big caveat is that there are no concrete plans around this yet. It's still very much a "what if" exercise, but I think there are some exciting possibilities here so I wanted to write something about it anyway.

Arigato

If you slogged all the way through this wall of text then you have my sincerest thanks for reading. I hope I've done a reasonable job of communicating the progress TripleO has made in this past cycle, and the future work that is planned. As always, if you have any questions or comments feel free to leave them on this post or contact me at the places mentioned in the About page.