I had an interesting experience this week, where I was faced with the need to restore an entire Active Directory environment from backups that were more than a year old.
The company whose servers I was restoring had been using an older version of Veeam Backup and Recovery, which always simplifies matters a lot: The entire thing was delivered to me over sneaker net, on a 2.5″ USB drive containing several restore points for each machine.
The restore was uneventful, as expected, and most machines simply started up in their new home. Unfortunately, one of the Active Directory controllers would bluescreen on boot, with a C00002E2 error message.
After some reading up on things, I realized the machine had passed the Active Directory tombstone period: as I wrote, the backups were taken over a year ago. Since I had one good domain controller, I figured I would simply cheat with the local time on the failing DC. It would boot successfully into Directory Services Recovery Mode, so I could set the local clock, but anybody who has a bit of experience with the VMware line of virtualization products knows that by default, VMware ESXi synchronizes the guest system clock in a few situations; amongst them on reboot.
Fortunately VMware has a knowledgebase article covering how to disable all synchronization of time between guests and hosts. A total of eight advanced settings must be set to False, with the guest turned off:
After setting these properties on the machine, I started it back up, with the system time set well back into the range before the tombstone cutoff date, let it start up and rest for a while for all services to realize everything was alright, and then I set the time forward to the current date, waited a bit longer, and restarted the VM. After this, the system started working as intended.
Sometimes it’s nice to mount a separate disk volume inside a directory structure. For a concrete example: At work we have a legacy system that writes copious amounts of data to subfolders of a network share. While vSphere allows for pretty large vdisks, after you pass 8 TB or so, they become cumbersome to manage. By mounting smaller disks directly in this directory structure, each disk can be kept to a manageable size.
First the bad news: the built-in filesystem discovery rules for the Zabbix Windows agent can only automatically enumerate legacy drive letters, so we get to know the status of the root file system, but not of the respective mounted volumes.
The good news, however, is that it’s a piece of cake to make Zabbix understand what you mean if you manually create data collection items for these subdirectories.
The key syntax in Zabbix 3 looks like this:
The only thing to remember is that we’re sending forward slashes in our query to the server agent even though we’re running Windows.
As part of this year’s server upgrades, we put together a new vSAN cluster at work. The machines are Lenovo SR650 servers with dual Xeon Gold 6132 14 Core CPUs, and 768 GB of RAM. Each server is equipped with two disk groups consisting of one 800 GB write intensive SSD and three 3.84 TB SSDs for use as capacity drives. The servers are connected to the network using two of their four 10GbE interfaces, and to our existing storage solution using dual FC interfaces. The version of VMware vSphere we’re currently running is 6.5 u2.
As part of setting up the solution, we ran benchmarks using VMware’s HCIBench appliance, available as a VMware Fling from here. HCIBench was configured to clear the read/write cache before testing, but re-use VMs if possible. The “Easy Run” setting was used since it lets the benchmarking program create a workload based on the individual vSAN environment. The transaction test ran using 20 VMs with 8 data disks each, and the IOPS numbers represent a 100% random, 70% read load on a 4 kb block-size.
The first run was pretty much the out-of-the-box configuration: The network between the hosts had not been tweaked at all, and we ran the workers with the stock storage policy, meaning basic data mirroring without striping.
For the second run, we separated vSAN traffic to its own dedicated NIC, and allowed jumbo frames between the hosts.
In the third run we tried to discern what striping virtual disks across capacity drives does to performance by creating a storage policy with a stripe width value of 2, and assigning it to all worker VMs.
Finally, in the fourth run, we turned on Compression and Deduplication on the vSAN and re-ran the same benchmark to see how performance and latency were affected.
(For clarity: We did perform several more benchmark tests to confirm that the values really were representative.)
The raw throughput performance numbers tells us whether we’re getting data through a connection as fast as possible. As seen by runs 2 and 3 in the graph below, we’re pretty much bouncing against the physical limits of our 12 Gbps SAS controllers and the 10GbE inter-host network. This value isn’t particularly relevant in real life other than that unexpectedly low numbers tell us we have a problem – see the result from run number 1 for a perfect example of that.
The transaction performance in benchmark form is another one of those numbers that give you an idea of whether something is seriously wrong, but otherwise is a rather hypothetical exercise. Once again we are hitting numbers approaching what the hardware is capable of in the two middle runs.
Finally a number that has a serious bearing on how our storage will feel: How long does it take from issuing a request to the storage system until the system confirms that the task is done? The blue line represents an average for the test period – but remember that this is during extreme load that the vSAN is unlikely to see in actual use. The 95th percentile bar tells us that 95% of storage operations take less time than this to complete.
Thoughts on the results
The first run really sticks out, as it should: It’s an exposition of what not to do in production. Storage really should have its own dedicated network. Interestingly, though, from my admittedly limited experience, going up to jumbo frames (MTU=9000) didn’t by itself make a huge difference in performance, but it should result in a bit less strain on the hardware putting network packets together.
Curiously enough, I saw no relevant difference between just mirroring and striping + mirroring virtual machine disks once the cluster had settled. The numbers are very close, percentage-wise. This echoes VMware’s own words:
In most real world use cases, we do not see significant performance increases from changing the striping policy. It is available and you should weigh the added complexity against the need before changing it from the default.
Finally we come to the run I haven’t really commented on yet: How much does performance suffer from the compression + deduplication option available in VMware vSAN? The simplified answer: About 20%, counted both in throughput and in transactional performance, and that doesn’t sound bad at all. But the latency numbers tell a slightly different tale: Average latency jumps up by a quarter, and 95th percentile latency by more than half. I see how the benefits of space-saving could make up for the drop in performance in some use-cases, but I would be wary of putting a heavily used production database on top of a storage layer that displays this sort of intermittent latency peaks.
In summary, vSAN on affordable hardware is slightly slower than a dedicated storage system like our IBM FlashSystem V9000, but that really says more about the wicked speed of the latter than being a negative against the former. For most real-world workloads in our environment the difference should be negligible, and well offset by the benefits of a fully software defined storage layer working hand-in-hand with the virtualization platform.
As usual nowadays, Lenovo SR650 servers come with energy saving presets that may seem ”green”, but which kill virtualization performance.
The regular way to get them running the way they should is to enter the UEFI setup at boot, go to UEFI Settings -> System Settings -> Operating Modes and choose ”Maximum Performance”. Unfortunately, on these servers, this removes the ability to set VMware EVC: the Enhanced vMotion Compatibility functionality that allows for live migration of virtual servers between hosts of different generations, for example when introducing a new cluster into a datacenter.
It turns out that what’s missing is one specific setting: ”MONITOR/MWAIT” must be set to ”Enabled”. It should be possible to first choose the ”Maximum Performance” scheme, then switch to the ”Custom” scheme and only change this single setting in Operating modes. In addition, we should also go to System Settings -> Devices and I/O Ports, and modify PCI 64-bit Resource Allocation to read ”Disabled”. For reference, the complete checklist is available from Lenovo: