Deploying VMware virtual machines using Ansible

I’ve been experimenting with deploying entire environments using Ansible. As usual I had to pass a couple of small thresholds and stumble into a couple of pitfalls before I was comfortable with solution, and so I’m documenting the process here.

I’m thinking of creating a separate post describing my general Ansible workflow in more detail for anybody who wants to know, but this post will cover how I’ve set up management of my vSphere environment from Ansible. .

Boring prerequisites

First of all, we should set up a user with the necessary rights in the vCenter. The Ansible crew has a good list of the requirements, reiterated here:

Datastore.AllocateSpace on the destination datastore or datastore folder

Network.Assign on the network to which the virtual machine will be assigned

Resource.AssignVMToPool on the destination host, cluster, or resource pool

VirtualMachine.Config.AddNewDisk on the datacenter or virtual machine folder

VirtualMachine.Config.AddRemoveDevice on the datacenter or virtual machine folder

VirtualMachine.Interact.PowerOn on the datacenter or virtual machine folder

VirtualMachine.Inventory.CreateFromExisting on the datacenter or virtual machine folder

VirtualMachine.Provisioning.Clone on the virtual machine you are cloning

VirtualMachine.Provisioning.Customize on the virtual machine or virtual machine folder if you are customizing the guest operating system

VirtualMachine.Provisioning.DeployTemplate on the template you are using

VirtualMachine.Provisioning.ReadCustSpecs on the root vCenter Server if you are customizing the guest operating system

I also added the VirtualMachine.Config.CPUCount, VirtualMachine.Config.Memory, VirtualMachine.Config.EditDevice, and VirtualMachine.Interact.DeviceConnection rights while I was at it.

These rights were added to a VMware Role. I then assigned this role to my domain user MYDOMAIN\ansible for the entire vCenter server with children.

Unfortunately this wasn’t enough to actually deploy VMs from templates: The ansible user needs to be allowed to write to VM folders or Ansible will barf with a permission-related error message. I solved this by creating the VM folder MyProject/WebServers and giving the MYDOMAIN\ansible user Administrator rights in this specific folder.

For Ansible – or rather Python – to communicate with my vCenter server, I had to ensure the necessary modules were installed. I use pip to ensure I have a recent version of Ansible stuff, and so I issued the relevant command:

pip3 install requests PyVmomi

Setting up the Ansible environment

The following two lines set up the skeleton directory structure I like to use:

mkdir -p myproject/{roles,inventories/test/{group_vars,host_vars/localhost}} && cd myproject
ansible-galaxy init roles/vm-deployment --offline

To clarify: The test subdirectory name has to do with the environment’s purpose, as in Dev, Test, Staging, Prod, rather than this being an experimental environment.

Inventories

A basic inventory file for Ansible may look like this:

---
all:
  children:
    webservers:
      hosts:
        websrvtest1:
        websrvtest2:
        websrvtestn:

The all group may contain an arbitrary number of hosts and child groups, which in turn may contain an arbitrary number of their own hosts or children. It’s also possible to put group and host variables straight into the inventory file, but I prefer to keep them separated. Note how every line ends with a colon (:). That’s on purpose and stuff breaks if they don’t.

Variables

Variables are key to reusable playbooks. Let’s set some up for this task:

vcenter: "vcenter.mydomain.tld"
vc_user: ansible
vc_pass: "{{ vault_vc_pass }}"
vc_datacenter: MyDatacenter
vc_cluster: VSANclstr
vm_template: w2019coretmpl
vm_folder: /MyProject/Test/WebServers
vm_network: vxw-dvs-161618-virtualwire-14-sid-5013-MyProject-Test
vm_datastore: vsanDatastore
vm_customization_spec: Win_Domain_member_DHCP
deploylist:
- cpmwebsrvtest1
- cpmwebsrvtest2
- cpmwebsrvtestn

Vaults

Note the "{{ vault_vc_pass }}" variable: I’m telling Ansible to look up the variable contents from some other variable. In this case it’s a hint to me that the contents are encrypted in an ansible vault. This way I don’t have to worry a lot that someone would get a hold of my private git repo: If they do I figure I have some time to change my secrets. I’m storing the vault in the same directories where I store my variable files, and a vault is intiated like this:

ansible-vault create inventories/test/host_vars/localhost/vault

I generate and store the vault passphrases in a password manager to simplify collaboration with my teams.

The vault file follows the same form as the vars one, but is encrypted on disk:

vault_vc_pass: password

Ansible tasks

The next step is to create a playbook that actually performs the magic here. In this case there’s a single step that’s looped for whatever number of machines (item) that are in my deploylist. There’s a lot more that can be customized with the vmware_guest Ansible module, but in this case my needs are simple: My vCenter customization specification does most of the job.

One thing to look for is the wait_for_customization parameter. This makes sure that Ansible doesn’t proceed to the next task until VMware has finished customizing the VM – in my case renaming the computer and joining it to a domain.

---
- name: Clone template
  vmware_guest:
    validate_certs: False
    hostname: "{{ vcenter }}"
    username: "{{ vc_user }}"
    password: "{{ vc_pass }}"
    datacenter: "{{ vc_datacenter }}"
    cluster: "{{ vc_cluster }}"
    folder: "{{ vm_folder }}"
    template: "{{ vm_template }}" 
    name: "{{ item }}"
    hardware:
      memory_mb: 6144
      num_cpus: 2
      num_cpu_cores_per_socket: 2
    networks:
    - name: "{{ vm_network }}"
    customization_spec: "{{ vm_customization_spec }}"
    wait_for_customization: yes
  with_items: "{{ deploylist }}"

Next we tell the role to invoke our playbook. This is slightly overkill for a role with just one actual task, but it’s nice to build a habit of keeping things tidy.

---
- include: deploy-vm.yml

Getting it all to run

Finally it’s time to create a master playbook to trigger the role (and potentially others):

---
- hosts: localhost 
  any_errors_fatal: true

  roles:
  - vm-deployment

To execute it all, we’ll use the ansible-playbook command:

ansible-playbook deploy-webserver.yml -i inventories/test --ask-vault-pass

After responding with the appropriate vault passphrase, Ansible goes to work, and in a couple of minutes a brand new virtual machine is ready to take on new roles.

Restoring a really old domain controller from backups

I had an interesting experience this week, where I was faced with the need to restore an entire Active Directory environment from backups that were more than a year old.

The company whose servers I was restoring had been using an older version of Veeam Backup and Recovery, which always simplifies matters a lot: The entire thing was delivered to me over sneaker net, on a 2.5″ USB drive containing several restore points for each machine.

The restore was uneventful, as expected, and most machines simply started up in their new home. Unfortunately, one of the Active Directory controllers would bluescreen on boot, with a C00002E2 error message.

After some reading up on things, I realized the machine had passed the Active Directory tombstone period: as I wrote, the backups were taken over a year ago. Since I had one good domain controller, I figured I would simply cheat with the local time on the failing DC. It would boot successfully into Directory Services Recovery Mode, so I could set the local clock, but anybody who has a bit of experience with the VMware line of virtualization products knows that by default, VMware ESXi synchronizes the guest system clock in a few situations; amongst them on reboot.

Fortunately VMware has a knowledgebase article covering how to disable all synchronization of time between guests and hosts. A total of eight advanced settings must be set to False, with the guest turned off:

tools.syncTime
time.synchronize.continue
time.synchronize.restore
time.synchronize.resume.disk
time.synchronize.shrink
time.synchronize.tools.startup
time.synchronize.tools.enable
time.synchronize.resume.host

The procedure is documented in KB1189.

After setting these properties on the machine, I started it back up, with the system time set well back into the range before the tombstone cutoff date, let it start up and rest for a while for all services to realize everything was alright, and then I set the time forward to the current date, waited a bit longer, and restarted the VM. After this, the system started working as intended.

Monitoring mounted Windows volumes using Zabbix

Sometimes it’s nice to mount a separate disk volume inside a directory structure. For a concrete example: At work we have a legacy system that writes copious amounts of data to subfolders of a network share. While vSphere allows for pretty large vdisks, after you pass 8 TB or so, they become cumbersome to manage. By mounting smaller disks directly in this directory structure, each disk can be kept to a manageable size. 

First the bad news: the built-in filesystem discovery rules for the Zabbix Windows agent can only automatically enumerate legacy drive letters, so we get to know the status of the root file system, but not of the respective mounted volumes.

The good news, however, is that it’s a piece of cake to make Zabbix understand what you mean if you manually create data collection items for these subdirectories.

The key syntax in Zabbix 3 looks like this:

vfs.fs.size[G:/topdir/subdir,pfree]

The only thing to remember is that we’re sending forward slashes in our query to the server agent even though we’re running Windows.

vSAN Benchmark Analysis

As part of this year’s server upgrades, we put together a new vSAN cluster at work. The machines are Lenovo SR650 servers with dual Xeon Gold 6132 14 Core CPUs, and 768 GB of RAM. Each server is equipped with two disk groups consisting of one 800 GB write intensive SSD and three 3.84 TB SSDs for use as capacity drives. The servers are connected to the network using two of their four 10GbE interfaces, and to our existing storage solution using dual FC interfaces. The version of VMware vSphere we’re currently running is 6.5 u2.

As part of setting up the solution, we ran benchmarks using VMware’s HCIBench appliance, available as a VMware Fling from here. HCIBench was configured to clear the read/write cache before testing, but re-use VMs if possible. The “Easy Run” setting was used since it lets the benchmarking program create a workload based on the individual vSAN environment. The transaction test ran using 20 VMs with 8 data disks each, and the IOPS numbers represent a 100% random, 70% read load on a 4 kb block-size.

The first run was pretty much the out-of-the-box configuration: The network between the hosts had not been tweaked at all, and we ran the workers with the stock storage policy, meaning basic data mirroring without striping. 

For the second run, we separated vSAN traffic to its own dedicated NIC, and allowed jumbo frames between the hosts. 

In the third run we tried to discern what striping virtual disks across capacity drives does to performance by creating a storage policy with a stripe width value of 2, and assigning it to all worker VMs.

Finally, in the fourth run, we turned on Compression and Deduplication on the vSAN and re-ran the same benchmark to see how performance and latency were affected.

(For clarity: We did perform several more benchmark tests to confirm that the values really were representative.)

Throughput

The raw throughput performance numbers tells us whether we’re getting data through a connection as fast as possible. As seen by runs 2 and 3 in the graph below, we’re pretty much bouncing against the physical limits of our 12 Gbps SAS controllers and the 10GbE inter-host network. This value isn’t particularly relevant in real life other than that unexpectedly low numbers tell us we have a problem – see the result from run number 1 for a perfect example of that.

Throughput in MB/s. Higher is better.

Transaction performance

The transaction performance in benchmark form is another one of those numbers that give you an idea of whether something is seriously wrong, but otherwise is a rather hypothetical exercise. Once again we are hitting numbers approaching what the hardware is capable of in the two middle runs.

Input/Output operations per second. Higher is better.

Latency

Finally a number that has a serious bearing on how our storage will feel: How long does it take from issuing a request to the storage system until the system confirms that the task is done? The blue line represents an average for the test period – but remember that this is during extreme load that the vSAN is unlikely to see in actual use. The 95th percentile bar tells us that 95% of storage operations take less time than this to complete.

Latency values in milliseconds – Lower is better

Thoughts on the results

The first run really sticks out, as it should: It’s an exposition of what not to do in production. Storage really should have its own dedicated network. Interestingly, though, from my admittedly limited experience, going up to jumbo frames (MTU=9000) didn’t by itself make a huge difference in performance, but it should result in a bit less strain on the hardware putting network packets together.

Curiously enough, I saw no relevant difference between just mirroring and striping + mirroring virtual machine disks once the cluster had settled. The numbers are very close, percentage-wise. This echoes VMware’s own words:

In most real world use cases, we do not see significant performance increases from changing the striping policy. It is available and you should weigh the added complexity against the need before changing it from the default.

https://blogs.vmware.com/virtualblocks/2016/09/19/vsan-stripes/

Finally we come to the run I haven’t really commented on yet: How much does performance suffer from the compression + deduplication option available in VMware vSAN? The simplified answer: About 20%, counted both in throughput and in transactional performance, and that doesn’t sound bad at all. But the latency numbers tell a slightly different tale: Average latency jumps up by a quarter, and 95th percentile latency by more than half.  I see how the benefits of space-saving could make up for the drop in performance in some use-cases, but I would be wary of putting a heavily used production database on top of a storage layer that displays this sort of intermittent latency peaks.

In summary, vSAN on affordable hardware is slightly slower than a dedicated storage system like our IBM FlashSystem V9000, but that really says more about the wicked speed of the latter than being a negative against the former. For most real-world workloads in our environment the difference should be negligible, and well offset by the benefits of a fully software defined storage layer working hand-in-hand with the virtualization platform.

Configuring Lenovo SR650 nodes for running vSphere

As usual nowadays, Lenovo SR650 servers come with energy saving presets that may seem ”green”, but which kill virtualization performance.

The regular way to get them running the way they should is to enter the UEFI setup at boot, go to UEFI Settings -> System Settings -> Operating Modes and choose ”Maximum Performance”. Unfortunately, on these servers, this removes the ability to set VMware EVC: the Enhanced vMotion Compatibility functionality that allows for live migration of virtual servers between hosts of different generations, for example when introducing a new cluster into a datacenter.

It turns out that what’s missing is one specific setting: ”MONITOR/MWAIT” must be set to ”Enabled”. It should be possible to first choose the ”Maximum Performance” scheme, then switch to the ”Custom” scheme and only change this single setting in Operating modes. In addition, we should also go to System Settings -> Devices and I/O Ports, and modify PCI 64-bit Resource Allocation to read ”Disabled”.  For reference, the complete checklist is available from Lenovo:

Processors.CStates=Disable
Processors.C1EnhancedMode=Disable
Processors.EnergyEfficientTurbo=Disable
Processors.MONITORMWAIT=Enable
Power.PowerPerformanceBias=Platform Controlled
Power.PlatformControlledType=Maximum Performance
DevicesandIOPorts.PCI64BitResourceAllocation=Disable
DevicesandIOPorts.MMConfigBase=3GB

After making these changes, we should be able to both run our workload at maximum performance and enable EVC to migrate workloads between server clusters utilizing CPUs from different generations.