My TripleO Development Workflow

I have complained extensively over the past couple of years about the over-automation of developer environments in TripleO. But wait, you say, isn't automation a good thing? And yes, it is, but the automation needs to happen in the right places (feel free to append "IMHO" to anything in this post ;-). The problem with a bunch of developer-specific automation is that it hides very real user experience problems because the developers just use their simplified interface and never touch the regular user interface. Essentially it's the opposite of dogfooding, which is something I feel is critical to writing good software. This is also known as the "Works in Devstack" problem for OpenStack as a whole (I will not be tackling that problem here though).

So if I don't like what most people are doing today, what would I prefer? That will be the topic of this post. I'll discuss what I do, and possible areas for improvement.

TLDR: Follow the docs.

Okay, that's a massive oversimplification, but it sums up the broad strokes of my philosophy on development. Developers need to be reading and following the docs, not just writing them (although writing them is also good!). That's what the users will do and if you want developers to improve the user experience the best way to accomplish that is for developers to have the user experience.

Some will argue that this is too difficult. I will counter that if it's too difficult for our developers to use our docs, then how the hell are our users supposed to figure them out? I will grant that our "basic" deployment docs are too complicated and include too many optional or advanced features. Unfortunately it is hard to get traction to clean that up when you're changing something that most developers don't look at. If developers are using the docs then they will be motivated to improve the docs rather than the developer tooling. Everybody wins.

Does this mean that I go through the docs step-by-step every time I do a deployment? Not at all. For one thing, I've been working on TripleO for so long that I could do a basic deployment without any docs whatsoever. But that's sort of irrelevant here since it's not going to be the case for 95+% of people working on TripleO.

What is more useful are the notes that I have about setting up a development environment. This is not an Ansible playbook, it's not a Puppet manifest, it's not even a script. It's a plain text file that documents the commands I have run in the past to set up my environment. It is also something that every user I have ever worked with has for their environment, basically a trimmed version of the docs with any extraneous bits removed and site-specific values included where appropriate. Some users will go even further and write an Ansible playbook or script to automate their deployment. This is fine, but until TripleO starts shipping a playbook as its top-level interface its developers should be using the interface that we do ship (please refer back to that IMHO aside from earlier).

In my case, my notes can be copy-pasted verbatim into a terminal and they will do everything from repo setup to node registration without my intervention. They stop short of an overcloud deployment because in general I end up needing to customize that in some way so there's no point wasting time on a stock deployment. If this sounds an awful lot like developer automation, well, it kind of is. The difference is that I'm still running the exact same commands a user would, and these notes came out of reading the docs extensively over the years. I also have to customize the steps on a regular basis, so in reality I only do these end-to-end simple runs maybe 50% of the time. The rest of the time I go through step-by-step and read the docs to figure out how I need to modify my standard commands to do what I need.

Like a user would. Seeing a pattern?

In addition, this level of automation is a recognition of the fact that developers can't always babysit their deployments. Everyone is busy, and nobody has time to sit around and watch their undercloud deploy for 30 minutes just so they can kick off the image build right after that. Queuing up commands so the deployment is somewhat fire and forget is a necessary concession to developer time constraints. I'm not so militantly opposed to automation that I would claim everyone should spend two hours doing nothing but watch their deployment every time they set up an environment. :-)

I debated whether to include my notes in this post. I don't want people substituting this blog post for the TripleO docs either. Ideally everyone would start at the docs and come up with their own development workflow and notes that reflect the things they do most often. Fortunately my notes are fairly specific to my environment so I'm not too concerned about that happening, and they're also only a basic framework. As I noted above, in many cases I have to customize the commands further so this post can't function as a replacement for the docs anyway.

So without further ado, here is what I do when setting up a basic development environment. I'm including some notes on what each section does so this isn't the copy-pastable version. And obviously this process changes from time to time as the documented install process changes, so YMMV on this working in the future:

  1. First up is virtual environment creation. This will vary depending on what resources are available to the developer. In my case, it is always OVB. This part can be replaced by any tooling that results in a set of VMs configured for TripleO's use, or skipped entirely if you're lucky enough to have real baremetal hardware.

    bin/deploy.py --quintupleo --name test --id test --poll -e env-base.yaml -e environments/all-networks-port-security.yaml
    bin/build-nodes-json.py -e env-test.yaml
    scp nodes.json centos@[undercloud floating ip]:~

  2. SSH to the undercloud. The rest of the steps will be done there.
  3. Clone tripleo-ci to get tripleo.sh and create a wrapper script to call it from the home directory. I use the script much less now than I used to, which is good because it means we've eliminated a lot of the problematic hacks that used to live there. I do still use it for the ping test though:

    sudo yum install -y git
    # There are too many things named tripleo-* as it is, so I give this a different prefix :-)
    git clone https://git.openstack.org/openstack-infra/tripleo-ci git-tripleo-ci
    echo '#!/bin/bash' > tripleo.sh
    echo 'git-tripleo-ci/scripts/tripleo.sh $@' >> tripleo.sh
    chmod +x tripleo.sh

  4. Create undercloud.conf. This isn't strictly necessary, but I do it to ensure that we don't let any assumptions about the default network cidr slip in and to ensure that the Undercloud Configuration Wizard is working properly. I also disable some optional stuff by default to speed up the undercloud install.

    curl "http://ucw-bnemec.rhcloud.com/?local_interface=eth1&network_cidr=9.1.1.0%2F24&node_count=10&undercloud_hostname=$(hostname -s).localdomain&local_ip=9.1.1.1%2F24&local_mtu=1500&network_gateway=9.1.1.1&undercloud_public_vip=9.1.1.2&undercloud_admin_vip=9.1.1.3&dhcp_start=9.1.1.4&dhcp_end=9.1.1.23&inspection_start=9.1.1.24&inspection_end=9.1.1.33&undercloud_service_certificate=&generate=Generate+Configuration" | grep -v html\> | sed -e 's/
    /\n/g' | tee undercloud.conf
    echo "enable_telemetry = false" >> undercloud.conf
    echo "enable_legacy_ceilometer_api = false" >> undercloud.conf
    echo "enable_ui = false" >> undercloud.conf
    echo "enable_validations = false" >> undercloud.conf
    echo "enable_tempest = false" >> undercloud.conf

  5. Stop cloud-init from messing with the hostname on reboot:

    sudo su
    echo "preserve_hostname: true" > /etc/cloud/cloud.cfg.d/99_hostname.cfg
    exit

  6. Set an environment variable indicate whether this is end-to-end. This one requires some explanation. There are two primary ways I use these commands: Sometimes I copy-paste the whole shebang into the terminal and just let it run sequentially. However, if I'm in a hurry I will split the undercloud install and image build into two separate sessions. The benefit here is that if you have enough memory you can install the undercloud and build images in parallel, which saves some significant time overall. However, images tend to build faster than the undercloud installs, so if you start both at the same time the image build needs to sleep for a while to ensure that the undercloud is ready for the images to be uploaded. This variable is just used to determine whether to run the sleep since we don't want to waste 10 minutes sleeping on a sequential run where we know the undercloud is already up. I could do something smarter, but this works and it's dead simple.

    export allinone=1

  7. Set up TripleO repos. This changed recently to use the new tripleo-repos project for repo management, and there are still a few rough edges that need to be polished. Most notably, there needs to be a consistent location where the tripleo-repos package can be found. In the absence of that, I do some wget magic to find the package in the "current" repo. I also force the use of the rh1 mirror because it's more squid-friendly than the forced HTTPS RDO repos (I will save that rant for another time though). I don't necessarily recommend that everyone do that since the mirror is intended for internal cloud use only and is maintained on a best-effort basis. Since I'm an admin on the cloud I kind of want to know if the mirror is down anyway though.

    sudo yum install -y wget
    wget -r --no-parent -nd -e robots=off -l 1 -A 'python2-tripleo-repos-*' https://trunk.rdoproject.org/centos7/current/
    sudo yum install -y python2-tripleo-repos-*
    sudo tripleo-repos current-tripleo-dev --rdo-mirror http://mirror01.regionone.tripleo-test-cloud-rh1.openstack.org:8080/rdo --centos-mirror http://mirror01.regionone.tripleo-test-cloud-rh1.openstack.org

  8. Install the undercloud. Pretty self-explanatory.

    sudo yum install -y python-tripleoclient
    openstack undercloud install
    . stackrc

  9. Build images. This sets a bunch of variables based on my local environment and tells DIB to use a local copy of the CentOS cloud image that I downloaded. I should note that many of these quite useful variables are not documented in the tripleo-docs, which is probably a thing we should address.

    [ "$allinone" != "1" ] && sleep 600
    curl -O http://11.2.2.3/CentOS-7-x86_64-GenericCloud-1707.qcow2
    export DIB_LOCAL_IMAGE=~/CentOS-7-x86_64-GenericCloud-1707.qcow2
    export DIB_DISTRIBUTION_MIRROR=http://mirror.centos.org/centos
    export DIB_EPEL_MIRROR=http://dl.fedoraproject.org/pub/epel
    export http_proxy=http://roxy:3128
    export no_proxy=9.1.1.1,192.0.2.1,9.1.1.2,192.0.2.2,192.168.0.1,192.168.0.2,192.168.24.1,192.168.24.2
    export DIB_YUM_REPO_CONF="/etc/yum.repos.d/delorean*"
    openstack overcloud image build
    . stackrc
    openstack overcloud image upload --update-existing
    # The proxy can sometimes cause issues and we're done with it, so clear the variable
    unset http_proxy

  10. Configure the "public" interface on the undercloud. There's no DHCP on that network so we have to do this manually. It also allows forwarding from that network so DNS from the overcloud nodes works when using network-isolation. Strictly speaking this is optional since you don't need any of it when deploying without net-iso, but it doesn't hurt to do it so I always do.
    # Beware, unsafe tmp location.  These are new dev systems so I don't care, but don't copy this pattern.
    cat >> /tmp/eth2.cfg <<EOF_CAT
    network_config:
        - type: interface
          name: eth2
          use_dhcp: false
          addresses:
            - ip_netmask: 10.0.0.1/24
            - ip_netmask: 2001:db8:fd00:1000::1/64
    EOF_CAT
    sudo os-net-config -c /tmp/eth2.cfg -v
    sudo iptables -A POSTROUTING -s 10.0.0.0/24 ! -d 10.0.0.0/24 -j MASQUERADE -t nat
    
  11. Import nodes and set them to available. I generally don't do introspection in dev environments. Occasionally I forget to upload nodes.json, hence the file existence check. :-)

    [ -f "nodes.json" ] && openstack overcloud node import --provide nodes.json

  12. Deploy the overcloud. As I mentioned above, this is a separate section of my notes where I document various deploy commands that I've used. The most basic is:

    openstack overcloud deploy --templates --libvirt-type qemu -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml

    Telemetry was causing a lot of problems at one point and I don't need it, so I've taken to disabling it by default to save time.

And that's it. Again, this doesn't come close to covering all of the things I do, but for details on the rest you'll have to read the TripleO Docs.