Host Filesystem Impact on Tempest Performance in OpenStack

As I mentioned in a previous post, about a year ago I picked up a 1U server from EBay to use as a local single-node OpenStack environment. In general I was quite happy with it, but at some point I got tired of paying for electricity to run a fairly power-hungry server that sits idle, or close to it, about 95% of the time. The fans also picked up an annoying whine somewhere along the line, so once I discovered how much more efficient a modern desktop processor would be, while actually performing better than the old dual server ones, I decided it was time for a new box. This post will be the story of my journey to get Tempest running in an acceptable fashion on it, and what I learned along the way.

Before getting started, I should discuss the new environment I'm dealing with here:

  • AMD FX-8320 8-ish* 3.5 Ghz cores
  • 16 GB (2x8) G.Skill 1866 RAM
  • Multiple cheap 1 TB disk drives in various software RAID configurations (more on this later)
  • Tempest running in a 4 CPU, 8 GB VM managed by OpenStack as installed by RDO Packstack on a Fedora 20 host

* -ish in the interest of not opening that can of worms ;-)

In summary, there were two big things that were impacting the performance of my Tempest runs: nested virt and the host filesystem/disk performance. Spoilers, I guess. ;-)

Nested Virt

The nested virt topic is the simple one. I just turned it off and immediately saw a huge increase in performance. One particular scenario test that I was using as a benchmark went from taking around 3 minutes with nested virt to just 1 using unaccelerated qemu. This seems a bit counterintuitive since you would expect the accelerated nested virt to be faster, if anything, but I think it demonstrates the immaturity of nested virt at this time. I also had issues with the entire system freezing when I was running nested instances that went away after I stopped doing that. This was a little unfortunate since part of the reason I had gone with an AMD processor was that I understood their nested virt to be more mature than Intel's and I was hoping to take advantage of that. Not the end of the world though.

Note that I'm told there have been significant improvements made to nested virt recently, but given that my two experiences with it so far have been failures (nested virt on the old 1U Intel server did not work right either) I can't say I'll be in a huge hurry to try it again right away. Maybe in another year or two.

In any case, that was the easy problem to find and fix. While disabling nested virt improved performance significantly, I still found that my full Tempest runs were unacceptably slow. I was seeing my hard drive activity pretty much pegged from the start of a run to the end, and because of that the CPU was almost sitting idle much of the time. While my mid-range processor might not be able to run with the big boys, it wasn't to blame here.

Host Filesystems

As it turns out, the biggest culprit was XFS. I've used XFS for quite a number of things over the years, and given that VM storage involves a lot of big files (usually touted as a strength of XFS) I thought it was a natural fit for a virtualization host. In some cases this might be true, but for running Tempest it was very much not. All levels of RAID, including 0, actually made this worse. I don't know if Fedora was improperly configuring the stripe size settings or something, but that didn't appear to be the case and I had the same issue in RAID 1, which doesn't have stripes at all. In fact, both RAID 0 and RAID 1 were much slower than a simple single-drive setup with XFS. I did try some filesystem/NCQ/scheduler tweaking, but none of it seemed to make a significant difference.

I'll include all of my test run results below, but you can see the extremely poor performance of XFS in RAID 0 in comparison to ext4 in RAID 0. It also appears my Seagate drives are slower than my Western Digital ones. In the interest of fairness, I should mention that for some things XFS was in fact noticeably faster, in particular doing package installs. It just kind of fell over when doing Tempest runs.

In addition, you will see some SSD results toward the end. I had initially hoped that I could get away with simply RAIDing more spinny disks to get acceptable performance, but ultimately I decided to bite the bullet and pick up a 256 GB SSD for my performance critical VM's and the OS. The others I am now booting from volume on an LVM RAID 5 array made up of the leftover rotational hard drives. This way I can have instances with fast disk, or instances with practically unlimited space (it's unlikely I would run through multiple TB of space doing any of the things I use this system for).

As you can see, the SSD results are quite good, although once again ext4 is much faster than XFS. Either Tempest runs are a pathologically bad use case for XFS, or the fact that the instances are all ext4 themselves has some impact (these are qcow2 images though, so I'm skeptical of that).

Just to add one more data point, I ran Tempest on a boot from volume instance too. After what I had seen from my RAID 0 runs, the results were surprising, to say the least. Even with the volumes on a software RAID 5 array, not a configuration renowned for excellent performance, it wasn't that far behind the SSD results and was comparable to the previous RAID 0 runs. My theory is that getting the OS (both operating system and OpenStack) disk activity off the rotational disks allowed them to perform better for the Tempest run. It probably doesn't hurt that the LVM volume is using the raw mdraid device either, so there's no underlying filesystem involved at all.

Conclusion

Anyway, I don't know how directly applicable this will be to most people running OpenStack since it's typically going to be deployed on server-class hardware with real RAID controllers, higher quality drives, and dedicated compute and database nodes to reduce disk contention. In addition, my results are far from scientific. All of the numbers below are single runs, so it's possible one or more of them could have been influenced by a transient load on the system, but given how long it takes to do a full Tempest run (especially in the configurations where performance was bad), I wasn't willing to go for a test methodology that would hold up to peer review. I just wanted to find what worked. :-)

However, I did spend quite a bit of time (reinstalling Fedora, RDO, and running Tempest for all of these different configurations was a multi-hour process) coming up with this information, so I thought I'd write it up in case it might be useful to someone else.

After months of off and on experimention, I'm quite happy with my new system's performance now. It ended up costing a bit more than I planned due to the extra drives and SSD, but the new system uses well under half the power of the old one, runs much, much quieter, and does perform better. I can be happy with that.

Raw results

Timing of tox -e full runs

RAID 0 WD/Seagate ext4
======
Totals
======
Run: 2359 in 6377.707648 sec.
 - Passed: 2156
 - Skipped: 200
 - Failed: 3

==============
Worker Balance
==============
 - Worker 0 (700 tests) => 0:37:34.151293s
 - Worker 1 (534 tests) => 0:45:04.037967s
 - Worker 2 (602 tests) => 0:38:58.548955s
 - Worker 3 (523 tests) => 0:44:16.987106s

 RAID 0 WD/WD ext4
 ======
Totals
======
Run: 2359 in 5190.060175 sec.
 - Passed: 2157
 - Skipped: 200
 - Failed: 2

==============
Worker Balance
==============
 - Worker 0 (700 tests) => 0:30:20.502750s
 - Worker 1 (534 tests) => 0:36:06.530701s
 - Worker 2 (602 tests) => 0:30:47.064148s
 - Worker 3 (523 tests) => 0:35:38.674856s


 RAID 0 WD/WD xfs
 ======
Totals
======
Run: 2361 in 11881.111973 sec.
 - Passed: 2144
 - Skipped: 200
 - Failed: 17

==============
Worker Balance
==============
 - Worker 0 (700 tests) => 1:13:16.662514s
 - Worker 1 (536 tests) => 1:28:40.118105s
 - Worker 2 (602 tests) => 1:06:12.777629s
 - Worker 3 (523 tests) => 1:21:01.110717s

 RAID 0 Seagate/Seagate ext4
 ======
Totals
======
Run: 2359 in 6316.422507 sec.
 - Passed: 2158
 - Skipped: 200
 - Failed: 1

==============
Worker Balance
==============
 - Worker 0 (700 tests) => 0:39:26.093815s
 - Worker 1 (534 tests) => 0:45:20.647021s
 - Worker 2 (602 tests) => 0:39:05.222058s
 - Worker 3 (523 tests) => 0:42:10.433892s

 SSD xfs
 ======
Totals
======
Run: 2361 in 5973.290727 sec.
 - Passed: 2156
 - Skipped: 200
 - Failed: 5

==============
Worker Balance
==============
 - Worker 0 (700 tests) => 0:33:53.633396s
 - Worker 1 (536 tests) => 0:42:07.423013s
 - Worker 2 (602 tests) => 0:33:22.217886s
 - Worker 3 (523 tests) => 0:39:16.427730s

 SSD ext4
 ======
Totals
======
Run: 2359 in 3600.612752 sec.
 - Passed: 2157
 - Skipped: 200
 - Failed: 2

==============
Worker Balance
==============
 - Worker 0 (700 tests) => 0:20:22.467067s
 - Worker 1 (534 tests) => 0:24:00.071754s
 - Worker 2 (602 tests) => 0:19:43.647173s
 - Worker 3 (523 tests) => 0:25:15.156410s

 RAID 5 volume
 ======
Totals
======
Run: 2236 in 5290.762352 sec.
 - Passed: 2041
 - Skipped: 193
 - Failed: 2

==============
Worker Balance
==============
 - Worker 0 (624 tests) => 0:39:18.886647s
 - Worker 1 (529 tests) => 0:30:19.590476s
 - Worker 2 (571 tests) => 0:24:39.270117s
 - Worker 3 (512 tests) => 0:40:11.858793s