This was the first repeat OpenStack Summit location for me. While there have been repeat locations in the past, I wasn't at the first Summit at any of those locations. I think that means I'm getting old. :-)
There was a lot that had changed, and also a lot that stayed the same. The Vancouver Convention Center is still a fantastic venue, with plenty of space for sessions. And although I did attend all of the Oslo sessions, just like last time, we didn't over-schedule Oslo this time so I had a chance to attend some non-Oslo sessions as well. Since I'm also focusing on Designate these days, I made sure to catch all of those sessions, even the one at 5:30 PM on Thursday when everyone was a bit tired and ready to leave. And it was good - there was useful information in that presentation. I felt more productive at this Summit than last time, which is certainly a good thing.
With the intro out of the way, let's get down to the nuts and bolts of the sessions I attended.
This is a thing that many operators have been asking to have for quite a while. Unfortunately, early attempts were problematic because the state of policy in OpenStack was not good. Fortunately, since then most (if not all) projects have adopted oslo.policy which allows proper deprecation of policy rules so we can actually fix some of the issues. Currently there are three proposed default roles: auditor (aka read-only), member, and admin. While two of those already exist, apparently there are many bugs around how they work which should be cleaned up as part of this work, resulting in three actually usable roles. This will eventually be tested via the Patrole project, but there's no point in writing tests now for broken behavior so the testing will happen in parallel with the implementation.
Chances are that the default roles won't satisfy everyone, but the hope is that they can address the needs of the majority of operators (I think 80% was mentioned as a target) to reduce the pain of configuring a production OpenStack deployment. I know there were a few operators in the room who didn't feel the proposed roles would help them, and there was some post-session discussion that hopefully surfaced their concerns to the Keystone team who are driving this work.
This is something that has been gaining momentum lately, with some larger teams either migrated or in the process of migrating to Storyboard. It was of particular interest to me as the Oslo PTL because I expect Oslo to be a fairly painful migration due to the sheer number of different Launchpad projects under the Oslo umbrella. However, it sounds like the migration process is becoming more mature as more projects make the move so by the time Oslo does migrate I have hope that it will go smoothly.
One significant request that came out of the session was some sort of redirect system that could point people who click on a bug reference in Gerrit to the appropriate location for projects that have migrated. I believe there was a suggestion that a separate server could be set up that had knowledge of which projects tracked bugs where and then could respond with appropriate redirects. Someone obviously would have to find the time to set that up though.
The current PTI for unit tests says to use stestr as the test runner. However, quite a few projects have not yet migrated, and this session was essentially to discuss why and how to move forward. One of the big "why"s was that people didn't know about it. A message was sent to openstack-dev, but I'm not sure it was clear to everyone that there was action required. Also, since most projects are still working, there's a natural "if it ain't broke, don't fix it" attitude toward this. Most of the projects that had been migrated were done by mtreinish, who has been driving this initiative.
Moving forward, there will be more of an emphasis on the benefits of moving to stestr and some work to provide a guide for how to do so. That should help get everyone into a consistent place on this.
Generally speaking, everyone recognizes that there needs to be a tighter feedback loop between developers and operators so that the people running OpenStack are getting the features and bugfixes they really need. In the past this hasn't always been the case, but things like co-locating the PTG and Ops meetup are intended to help. This was a discussion of how to further improve that relationship.
There was a point raised that developer events in the past had been perceived as (and perhaps were) hostile to operators on older releases. The good news is that this attitude seems to have softened considerably in the past few cycles, with initiatives like fast-forward upgrades and extended maintenance releases acknowledging that in the real world not everyone can do continuous deployment and always be on the latest bleeding edge code.
A lot of the discussion around future changes to improve this had to do with the current split in mailing lists. Creating a separation between users, operators, and developers of OpenStack is obviously not ideal. The main suggestion was to move to a more traditional -discuss and -announce split, with the latter being a low-traffic list just used for major announcements. There was some concern that even though the development mailing list is not quite as active as it was at its peak, there is still an order of magnitude more traffic there than on the other lists and it might become overwhelming if everyone just got dumped into it. Related to this, there was some discussion of moving to mailman3, which provides a forum-like interface to allow casual contributors to join discussions without subscribing to the full firehose of mailing list traffic. There were a few technical concerns with it, but overall it seemed promising.
Also out of this session came a side topic about part-time contributors and what to do in this new OpenStack world where many of the contributors to projects can't devote 100% of their time to a project. As there wasn't time in the session to cover this adequately, a separate hallway track discussion was scheduled, which I will cover later.
Python 2 is going away. Currently OpenStack still runs primarily on Python 2, which means we have some work to do before early 2020 when upstream support for it ends. The first step will be to re-orient our testing to focus on Python 3. In the past, jobs ran on Python 2 by default and Python 3 jobs were the outliers. That has to change.
The current timeline is to have Python 3 tested as the primary target by the end of the Stein cycle, and then to have Python 3 in a state where it can become the only supported Python version by the end of the T cycle, so that Python 2 support can be dropped early in the U cycle. Assuming we hit these targets, they line up well with the upstream Python plans.
On Tuesday we held an unofficial Oslo hack session to work through some things on the oslo.config driver patches that are in progress. They are intended to add new functionality to oslo.config which will enable features such as storing config in a key-value store like etcd and moving secret data (passwords, tokens, etc.) to a more secure service accessible via Castellan. The details are documented in the link above, but overall I think we made good progress on the plan and identified some concrete actions needed to move the work forward.
This was the first time we've done an Oslo onboarding session, and all things considered I would say it went reasonably well. There was some digression in the discussion, which is understandable in a project as wide-ranging as Oslo. There was also some interest in getting involved though, so I think it was a worthwhile session. Most importantly, it wasn't me talking to an empty room for 40 minutes. :-)
For the most part I just recommend you go watch the video of Graham's presentation. He'll do a better job explaining Designate stuff than I can. However, one thing that was sort of specific to me from this session was that I met a couple of Designate deployers who had written Ansible playbooks to deploy it in a TripleO overcloud post-deployment. Obviously for full integration we want it done as part of the main deployment, but their work could definitely come in handy as we move to use more Ansible in TripleO.
This was a followup to a presentation at the Dublin PTG that was exploring the use of different messaging technologies in Edge deployments. It isn't testing full OpenStack edge deployments, but it did simulate the messaging architecture you'd see in such a deployment. The biggest takeaway was that distributed messaging technology like Qpid Data Router can significantly improve performance in a widely distributed system versus a broker-based system like RabbitMQ.
I don't have too much to say about this beyond what is already in the video linked from the session page. I do really need to stop saying "umm" so much when I speak though. ;-)
Okay, this wasn't OpenStack-related, but if you're into mountain biking then you know BC is a great place to do it, so it was hard to resist getting in a little time on the dirt when it was walking distance from the conference. I managed to escape with only minor scrapes and bruises after my long lunch ride.
As I mentioned earlier, this came up in one of the dev/ops sessions as a pain point. I had the opportunity to sit down with Julia Kreger, Tim Bell, and Thierry Carrez to try to identify some ways we could make it easier for new or occasional contributors to work on OpenStack. This is particularly important in the academic world where research contracts are often for a very specific, fixed period of time. If changes don't make it in during that window, they will tend to be abandoned.
A number of ideas were suggested, and ultimately we decided to focus on what we hoped would be the least controversial option to avoid the boil-the-ocean problem of attacking everything at once. To that end, we decided to propose a new policy targeted at reducing the amount of nit-picking in the review process. -1's over typos in code comments or the use of passive voice in specs do not meaningfully contribute to the quality of OpenStack software, but they can be disproportionately demotivating to both new and experienced developers alike. I know I personally have changed my reviewing style in a big way as a result of my own frustration with being on the receiving end of nit-pick reviews (deepest apologies to everyone I have nit-picked in the past).
This proposal led to a rather long mailing list thread, which I think demonstrates why we decided to stick with one relatively simple change in the beginning. As it was, the discussion tangented into some of the other areas we would like to address eventually but didn't want to get bogged down with right now.
Overall, I have high hopes that this initiative will make OpenStack a more pleasant project to work on while not meaningfully harming the quality of the software.
I must confess I'm not sure I fully understand the proposed way forward here. It seemed to me that there are two conflicting goals here: 1) To not break existing users of OpenStack APIs and 2) To make it easier for users to consume new functionality added in more recent microversions of OpenStack APIs. The concern seemed to be that many users are not aware of or not able to use new microversions so they are missing out on functionality and improved API designs. However, if we raise the default microversion then we open up the possibility of breaking existing users because new microversions may not be 100% compatible with old ones. As long as microversions are opt-in that's fine, but once you start changing the minimum microversion it becomes a problem.
The proposal was sort of a "big bang" major version bump across OpenStack. Essentially we would pick a cycle and have all of the projects do their desired API cleanup and everyone would tag a new major version of their API at about the same time. I'm still not entirely clear how this solves the problem I mentioned above though. A new default major version still opens up the possibility of breaking users that rely on older behavior, and a new major version that isn't the default still requires users to opt in. Maybe it's just that opting in to a new major version is easier than a new microversion?
I'm hoping that I missed some fundamental key to how this works, or maybe just missed that some of these tradeoffs are considered acceptable. In any case, it will be interesting to see how this work progresses.
This ended up being another session that focused heavily on how to keep the OpenStack community healthy. The proposal was that there should be a group of people who are solely concerned with maintaining and cleaning up the code. This group would not be involved in new feature work.
Obviously this is a hard sell, as most developers want to do feature work. In addition, if you're not actively working on the code it's harder to stay up to date on where the project is going so you can provide useful architectural reviews. Overall, I did not feel like the idea of dedicated maintainers gained much traction in the room, but there was a lot of good discussion of how to encourage maintenance-type activities from existing and new contributors. The details can be found on the etherpad.
In the Stein Release Goal session...we identified a whole bunch of goals for the T release. Okay, not entirely true, but there were a number of ideas floated that got nacked because they wouldn't be ready for Stein, but might be for T. I'm not going to try to cover them all here, but you can read more on the etherpad.
The other thing that happened in this session was we got rather side-tracked on the topic of how to select goals and what makes a good goal. The point was made that it's good to have community goals with a "wow" factor. These help the marketing teams and attract people to OpenStack, a good thing for everyone. However, the question was raised as to why we aren't selecting only "wow" goals that immediately address operator needs. It's a valid question, but it's not as simple as it appears on the surface.
See, all of the goals ultimately benefit operators. But the strategy so far with community goals has been to select one directly operator-facing goal, and one indirect goal. The latter is often related to cleaning up tech debt in OpenStack. While that may not have the same kind of immediate impact that something like mutable config does, it can have a huge long-term impact on the health and velocity of the project. Sure, splitting out the Tempest plugins for projects didn't have a direct impact on operators in that cycle, but it freed up bandwidth for everyone to be able to land new features faster. We paid down the debt in the short term to enable larger long term gains.
All of which is basically me saying that I like the idea behind our goal selection up to this point. I think one of each is a good balance of both immediate and longer-term impact.
In this session there was also some post-mortem of the previous goals. The WSGI API deployment goal was pointed out as one that did not go so well. Halfway through the cycle there was a massive shift in direction for that goal which caused bunch of re-work and bad feelings about it. As a result, there were some recommendations for criteria that goals need to meet going forward to avoid selection of goals that aren't quite fully-baked yet. You can also read more about those on the etherpad.
I mostly attended this because it involves the new oslo.limit library so I thought I should have some idea of what was going on. I'm really glad I did though because it turned out to be an excellent deep dive into how the unified limits API is going to work and how it could address the needs of some of the operators in the room. I came out of the session feeling very good about where quota management in OpenStack is headed.
The very last session slot of the Summit, and as a result there wasn't a ton of audience participation (although there were still a fair number of people in attendance). However, there was quite a bit of useful information presented so I recommend watching the video if you are interested in operating Designate.
I skipped writing a summary for a few of the sessions that I attended, either because I thought they would be covered elsewhere or because they were simply too much to discuss in this already too-long post. I hope what I wrote above was interesting and maybe even a little helpful though.