Fair warning: This is gonna be looooong. Proceed at your own risk. ;-)
Since I started working with OpenShift on baremetal one of the things I've wanted to do is deploy OpenShift using OpenStack Virtual Baremetal to provide the host VMs. The usual developer setup is dev-scripts, which uses libvirt to stand up a virtual baremetal environment. This works fine, but it has a few drawbacks:
Unfortunately, OpenShift on OVB has some problems too. Most notably, baremetal OpenShift uses a libvirt VM as the bootstrap node. This means you either need to use nested virt (no thank you) or you have to find some way to hook an actual baremetal node into your OpenStack environment.
We recently had a week set aside for developers to work on little pet projects like this, and I was able to get a proof-of-concept of OpenShift on OVB working. How useful is it? I'm not sure, but that's not really the point of a PoC. As you'll see, the environment setup is rather complicated, but it does have the advantage of significantly lowering the requirements for hardware assigned to individual developers. Because the majority of the compute power lives in the OpenStack cloud, a developer might only need a 32 GB machine, possibly even their own laptop, to do dev-scripts development.
Yes, I'm still using dev-scripts for this. Just getting that to work was sufficiently complex that I didn't have time to try running the OpenShift installer standalone. Baby steps. :-)
What I ended up doing was to use the OVB "undercloud" node as a sort of proxy into the virtual networks of the OVB environment. I also created 3 masters and 2 workers (the latter using OVB's role functionality so the workers could use a smaller flavor to save some resources) and OVB provided the necessary IPMI and PXE control over them. Note that I did this on my personal cloud with the Nova PXE boot patch applied. I expect this could be done on a public cloud using the ipxe boot image, but I haven't tried it.
There are three networks that are relevant for an OpenShift Baremetal IPI deployment: provisioning, baremetal, and BMC. Conveniently, these all map nicely to some of the networks used for TripleO, so I didn't have to make any changes to the OVB network templates. The stock undercloud (proxy in my case) already has all of those attached.
Here are the OVB configs I used:ovb-deploy --quintupleo --name ocp --id openshift --poll -e env-ocp.yaml -e environments/all-networks.yaml --role role-worker.yaml
env-ocp.yaml:
parameter_defaults:
baremetal_flavor: master
baremetal_image: centos-stream
baremetal_prefix: master
bmc_flavor: bmc
bmc_image: centos7
bmc_prefix: bmc
external_net: external
key_name: default
node_count: 3
private_net: private
provision_net: provision
provision_net_shared: False
public_net: public
public_net_shared: False
role: ''
undercloud_flavor: m1.small
undercloud_image: centos-stream
undercloud_name: proxy
role-worker.yaml:
parameter_defaults:
baremetal_flavor: worker
baremetal_image: centos-stream
key_name: default
node_count: 2
role: worker
baremetal_name_template: worker-%index%
resource_registry:
OS::OVB::BaremetalPorts: templates/baremetal-ports-all.yaml
To give my baremetal node that hosted the bootstrap VM access to the environment, I used two different methods: socat and OpenVPN. In retrospect I probably could have used exclusively OpenVPN and just given the baremetal node a VPN to the BMC network, but because I tackled that first I ended up using a simpler socat-based method and since it worked I didn't bother changing it. Two OpenVPN tunnels were also needed for the provisioning and baremetal networks.
The BMC instances in OVB are on the private network that has a port on the external router and thus can be assigned floating IPs, but you can only assign one floating IP to a VM at a time and since there are multiple BMCs running, each with its own IP address, I would need some way to redirect traffic anyway. I ended up running socat on the proxy VM, listening on unique ports for each node's BMC, then had socat forward that traffic to the appropriate private IP. For simplicity, socat listens on the port 6[the final octet of the BMC IP]. So if the BMC is 12.1.1.188, socat listens on 6188.
This worked fine, although it wasn't perfect (see Known Issues below).
Aside: I initially found a method of doing this with netcat, but it was only able to handle a single IPMI call before the tunnel broke. I'm not sure what was wrong, but I found other posts complaining about the same thing with netcat. Since socat was a bit simpler anyway, I just went with that.
This is the script I used to start socat on the proxy:
#!/bin/bash
set -ex
for i in 93 229 235 138 188
do
socat udp4-listen:6$i,reuseaddr,fork udp4:12.1.1.$i:623 &
done
To test IPMI functionality, you can use ipmitool on the dev-scripts host. The -H is the floating IP of the proxy and -p is the port assigned to the node.
ipmitool -I lanplus -U admin -P password power status -H 11.3.3.5 -p 6188
While I've used OpenVPN before, it was always in "tun" mode. In this case I needed to be able to DHCP and PXE over the tunnel, which meant I needed to use "tap" mode. This didn't turn out to be drastically more complicated, but there was one weird behavior (bug?) that did cause me a lot of angst. We'll get to that in a moment.
First, I followed one of the many OpenVPN setup guides out there and generated all the necessary keys and certificates. I'm not going to go into detail here, but all of the files referenced in my configs do exist with the appropriate content. For the provisioning network, the server config looks like:
port 1194
proto udp
# Note: tap instead of tun
dev tap
# These two are needed to allow some scripting when the connection is brought up
script-security 2
up up.sh
# I'm not sure these are actually accomplishing anything, but I think there are MTU issues with my environment and these were my attempt to fix that.
tun-mtu 1600
fragment 1500
mssfix
# The rest of this is all pretty standard stuff, I believe.
ca ca.crt
cert server.crt
key server.key # This file should be kept secret
dh dh.pem
topology subnet
ifconfig-pool-persist ipp.txt
server-bridge 192.168.24.2 255.255.255.0 192.168.24.50 192.168.24.100
push "route 192.168.24.0 255.255.255.0"
keepalive 10 120
tls-auth ta.key 0 # This file is secret
cipher AES-256-CBC
persist-key
persist-tun
status openvpn-status.log
verb 3
And the corresponding client config:
client
# This all needs to match the server
dev tap
proto udp
tun-mtu 1600
fragment 1500
mssfix
# Standard bits
remote 11.3.3.5 1194
resolv-retry infinite
nobind
user nobody
group nobody
persist-key
persist-tun
ca ca.crt
cert wonderland.crt
key wonderland.key
remote-cert-tls server
tls-auth ta.key 1
cipher AES-256-CBC
verb 3
In order for tap mode in OpenVPN to work, you need to bridge the appropriate interface(s). Since I also work with NMState quite a bit, I used that to create my bridge:
interfaces:
- name: br0
type: linux-bridge
state: up
# I later changed this manually
mtu: 1450
ipv4:
enabled: true
address:
# This is the IP OpenStack assigned to the provisioning nic on my proxy. I don't think it matters if it matches, but I figured it wouldn't hurt.
- ip: "192.168.24.184"
prefix-length: 24
bridge:
port:
- name: eth1
Finally, I needed to run a couple of commands when the server brings up the tap interface. This is the contents of the up.sh script:
#!/bin/bash
/sbin/brctl addif "br0" "$1"
/sbin/ip l set dev "$1" mtu 1600
/sbin/ip l set dev "$1" up
Note that last line. Remember that weird behavior I mentioned? Yeah, turns out the tap device is not brought up by default. Maybe I'm doing something wrong here, but that was surprising to me. Once I added the link up command my tap-based VPN started working.
I also had to start a second OpenVPN instance to handle the "baremetal" interface (as it's known in dev-scripts). That followed the same process as above, but with ports and addresses changed as necessary. For completeness, here are the configs I used:
server:
port 1195
proto udp
dev tap
script-security 2
up up1.sh
tun-mtu 1600
fragment 1500
mssfix
ca ca.crt
cert server.crt
key server.key # This file should be kept secret
dh dh.pem
topology subnet
ifconfig-pool-persist ipp.txt
server-bridge 192.168.111.222 255.255.255.0 192.168.111.100 192.168.111.150
push "route 192.168.111.0 255.255.255.0"
keepalive 10 120
tls-auth ta.key 0 # This file is secret
cipher AES-256-CBC
persist-key
persist-tun
status openvpn-status.log
verb 3
client:
client
dev tap
proto udp
tun-mtu 1600
fragment 1500
mssfix
remote 11.3.3.5 1195
resolv-retry infinite
nobind
user nobody
group nobody
persist-key
persist-tun
ca ca.crt
cert wonderland.crt
key wonderland.key
remote-cert-tls server
tls-auth ta.key 1
cipher AES-256-CBC
verb 3
interfaces:
- name: br1
type: linux-bridge
state: up
mtu: 1450
ipv4:
enabled: true
address:
# This one does not match what OpenStack assigned, confirming that it didn't matter for the provisioning network either.
- ip: "192.168.111.222"
prefix-length: 24
bridge:
port:
- name: eth2
up1.sh:
#!/bin/bash
/sbin/brctl addif "br1" "$1"
/sbin/ip l set dev "$1" mtu 1600
/sbin/ip l set dev "$1" up
There was one more thing I had to do on the client side to make this setup work with dev-scripts. I let dev-scripts manage my bridges for me, and if the tap devices had an IP assigned to them ahead of time that broke dev-scripts. So, I removed the IPs:
ip a flush dev tap0
ip a flush dev tap1
Speaking of dev-scripts, here are the configuration variables I had to set in order to make dev-scripts work in this environment:
export NODES_PLATFORM="baremetal"
# Note: My nodes file got deleted every time I ran "make clean". Be sure to have a backup.
export NODES_FILE="/home/bnemec/dev-scripts/hosts.json"
# Corresponding to the OpenVPN interfaces
export INT_IF="tap1"
export PRO_IF="tap0"
# In libvirt VMs this would be enp1s0, OpenStack is different even though it also libvirt-based
export CLUSTER_PRO_IF="ens3"
export IP_STACK="v4"
export BMC_DRIVER="ipmi"
export PROVISIONING_NETWORK="192.168.24.0/24"
# Again, different from vanilla libvirt VMs
export ROOT_DISK_NAME="/dev/vda"
# I deployed with static IP addresses to avoid needing to add DHCP to the baremetal network. By default, OVB disables DHCP because TripleO didn't use it.
export NETWORK_CONFIG_FOLDER=/home/bnemec/dev-scripts/network-config-static/
And here is the contents of my nodes file:
{
"nodes": [
{
"name": "ostest-master-0",
"driver": "ipmi",
"resource_class": "baremetal",
"driver_info": {
"username": "admin",
"password": "password",
"address": "ipmi://11.3.3.5:6235",
"deploy_kernel": "http://192.168.24.2/images/ironic-python-agent.kernel",
"deploy_ramdisk": "http://192.168.24.2/images/ironic-python-agent.initramfs",
"disable_certificate_verification": false
},
"ports": [{
"address": "fa:16:3e:16:a5:9c",
"pxe_enabled": true
}],
"properties": {
"local_gb": "20",
"cpu_arch": "x86_64"
}
},
{
"name": "ostest-master-1",
"driver": "ipmi",
"resource_class": "baremetal",
"driver_info": {
"username": "admin",
"password": "password",
"address": "ipmi://11.3.3.5:6138",
"deploy_kernel": "http://192.168.24.2/images/ironic-python-agent.kernel",
"deploy_ramdisk": "http://192.168.24.2/images/ironic-python-agent.initramfs"
},
"ports": [{
"address": "fa:16:3e:34:31:af",
"pxe_enabled": true
}],
"properties": {
"local_gb": "20",
"cpu_arch": "x86_64"
}
},
{
"name": "ostest-master-2",
"driver": "ipmi",
"resource_class": "baremetal",
"driver_info": {
"username": "admin",
"password": "password",
"address": "ipmi://11.3.3.5:6188",
"deploy_kernel": "http://192.168.24.2/images/ironic-python-agent.kernel",
"deploy_ramdisk": "http://192.168.24.2/images/ironic-python-agent.initramfs",
"disable_certificate_verification": true
},
"ports": [{
"address": "fa:16:3e:28:9a:a5",
"pxe_enabled": true
}],
"properties": {
"local_gb": "20",
"cpu_arch": "x86_64"
}
},
{
"name": "ostest-worker-0",
"driver": "ipmi",
"resource_class": "baremetal",
"driver_info": {
"username": "admin",
"password": "password",
"address": "ipmi://11.3.3.5:693",
"deploy_kernel": "http://192.168.24.2/images/ironic-python-agent.kernel",
"deploy_ramdisk": "http://192.168.24.2/images/ironic-python-agent.initramfs"
},
"ports": [{
"address": "fa:16:3e:b7:45:d9",
"pxe_enabled": true
}],
"properties": {
"local_gb": "20",
"cpu_arch": "x86_64"
}
},
{
"name": "ostest-worker-1",
"driver": "ipmi",
"resource_class": "baremetal",
"driver_info": {
"username": "admin",
"password": "password",
"address": "ipmi://11.3.3.5:6229",
"deploy_kernel": "http://192.168.24.2/images/ironic-python-agent.kernel",
"deploy_ramdisk": "http://192.168.24.2/images/ironic-python-agent.initramfs"
},
"ports": [{
"address": "fa:16:3e:2c:a0:fe",
"pxe_enabled": true
}],
"properties": {
"local_gb": "20",
"cpu_arch": "x86_64"
}
}
]
}
Run make
and you should end up with a cluster deployed on your OVB instances. Well, almost. See below for a manual workaround I had to do during the deployment to get all the nodes to deploy correctly.
I ran into a few problems and didn't solve them all. Here they are, in case you're interested: