Deploying VMware virtual machines using Ansible

I’ve been experimenting with deploying entire environments using Ansible. As usual I had to pass a couple of small thresholds and stumble into a couple of pitfalls before I was comfortable with solution, and so I’m documenting the process here.

I’m thinking of creating a separate post describing my general Ansible workflow in more detail for anybody who wants to know, but this post will cover how I’ve set up management of my vSphere environment from Ansible. .

Boring prerequisites

First of all, we should set up a user with the necessary rights in the vCenter. The Ansible crew has a good list of the requirements, reiterated here:

Datastore.AllocateSpace on the destination datastore or datastore folder

Network.Assign on the network to which the virtual machine will be assigned

Resource.AssignVMToPool on the destination host, cluster, or resource pool

VirtualMachine.Config.AddNewDisk on the datacenter or virtual machine folder

VirtualMachine.Config.AddRemoveDevice on the datacenter or virtual machine folder

VirtualMachine.Interact.PowerOn on the datacenter or virtual machine folder

VirtualMachine.Inventory.CreateFromExisting on the datacenter or virtual machine folder

VirtualMachine.Provisioning.Clone on the virtual machine you are cloning

VirtualMachine.Provisioning.Customize on the virtual machine or virtual machine folder if you are customizing the guest operating system

VirtualMachine.Provisioning.DeployTemplate on the template you are using

VirtualMachine.Provisioning.ReadCustSpecs on the root vCenter Server if you are customizing the guest operating system

I also added the VirtualMachine.Config.CPUCount, VirtualMachine.Config.Memory, VirtualMachine.Config.EditDevice, and VirtualMachine.Interact.DeviceConnection rights while I was at it.

These rights were added to a VMware Role. I then assigned this role to my domain user MYDOMAIN\ansible for the entire vCenter server with children.

Unfortunately this wasn’t enough to actually deploy VMs from templates: The ansible user needs to be allowed to write to VM folders or Ansible will barf with a permission-related error message. I solved this by creating the VM folder MyProject/WebServers and giving the MYDOMAIN\ansible user Administrator rights in this specific folder.

For Ansible – or rather Python – to communicate with my vCenter server, I had to ensure the necessary modules were installed. I use pip to ensure I have a recent version of Ansible stuff, and so I issued the relevant command:

pip3 install requests PyVmomi

Setting up the Ansible environment

The following two lines set up the skeleton directory structure I like to use:

mkdir -p myproject/{roles,inventories/test/{group_vars,host_vars/localhost}} && cd myproject
ansible-galaxy init roles/vm-deployment --offline

To clarify: The test subdirectory name has to do with the environment’s purpose, as in Dev, Test, Staging, Prod, rather than this being an experimental environment.

Inventories

A basic inventory file for Ansible may look like this:

---
all:
  children:
    webservers:
      hosts:
        websrvtest1:
        websrvtest2:
        websrvtestn:

The all group may contain an arbitrary number of hosts and child groups, which in turn may contain an arbitrary number of their own hosts or children. It’s also possible to put group and host variables straight into the inventory file, but I prefer to keep them separated. Note how every line ends with a colon (:). That’s on purpose and stuff breaks if they don’t.

Variables

Variables are key to reusable playbooks. Let’s set some up for this task:

vcenter: "vcenter.mydomain.tld"
vc_user: ansible
vc_pass: "{{ vault_vc_pass }}"
vc_datacenter: MyDatacenter
vc_cluster: VSANclstr
vm_template: w2019coretmpl
vm_folder: /MyProject/Test/WebServers
vm_network: vxw-dvs-161618-virtualwire-14-sid-5013-MyProject-Test
vm_datastore: vsanDatastore
vm_customization_spec: Win_Domain_member_DHCP
deploylist:
- cpmwebsrvtest1
- cpmwebsrvtest2
- cpmwebsrvtestn

Vaults

Note the "{{ vault_vc_pass }}" variable: I’m telling Ansible to look up the variable contents from some other variable. In this case it’s a hint to me that the contents are encrypted in an ansible vault. This way I don’t have to worry a lot that someone would get a hold of my private git repo: If they do I figure I have some time to change my secrets. I’m storing the vault in the same directories where I store my variable files, and a vault is intiated like this:

ansible-vault create inventories/test/host_vars/localhost/vault

I generate and store the vault passphrases in a password manager to simplify collaboration with my teams.

The vault file follows the same form as the vars one, but is encrypted on disk:

vault_vc_pass: password

Ansible tasks

The next step is to create a playbook that actually performs the magic here. In this case there’s a single step that’s looped for whatever number of machines (item) that are in my deploylist. There’s a lot more that can be customized with the vmware_guest Ansible module, but in this case my needs are simple: My vCenter customization specification does most of the job.

One thing to look for is the wait_for_customization parameter. This makes sure that Ansible doesn’t proceed to the next task until VMware has finished customizing the VM – in my case renaming the computer and joining it to a domain.

---
- name: Clone template
  vmware_guest:
    validate_certs: False
    hostname: "{{ vcenter }}"
    username: "{{ vc_user }}"
    password: "{{ vc_pass }}"
    datacenter: "{{ vc_datacenter }}"
    cluster: "{{ vc_cluster }}"
    folder: "{{ vm_folder }}"
    template: "{{ vm_template }}" 
    name: "{{ item }}"
    hardware:
      memory_mb: 6144
      num_cpus: 2
      num_cpu_cores_per_socket: 2
    networks:
    - name: "{{ vm_network }}"
    customization_spec: "{{ vm_customization_spec }}"
    wait_for_customization: yes
  with_items: "{{ deploylist }}"

Next we tell the role to invoke our playbook. This is slightly overkill for a role with just one actual task, but it’s nice to build a habit of keeping things tidy.

---
- include: deploy-vm.yml

Getting it all to run

Finally it’s time to create a master playbook to trigger the role (and potentially others):

---
- hosts: localhost 
  any_errors_fatal: true

  roles:
  - vm-deployment

To execute it all, we’ll use the ansible-playbook command:

ansible-playbook deploy-webserver.yml -i inventories/test --ask-vault-pass

After responding with the appropriate vault passphrase, Ansible goes to work, and in a couple of minutes a brand new virtual machine is ready to take on new roles.

Managing Windows servers with Ansible

Although I to a large degree get to play with the fun stuff at work, much of our environment still consists of Windows servers, and that will not be changing for a long time. As I’ve mentioned in earlier posts, I try to script my way around singular Windows servers using Powershell whenever it makes sense, but when a set of changes needs to be performed across groups of servers – especially if it’s something recurring – my tool of choice really is Ansible.

The Ansible management server (which has to be running a Unix-like system) needs to be able to communicate securely with the Windows hosts. WinRM, which is the framework used under the hood, allows for a number of protocols for user authentication and transfer of commands. I personally like to have my communications TLS secured, and so I’ve opted for using CredSSP which defaults to an HTTPS-based communications channel.

A huge gotcha: I tried running the tasks below from a Ubuntu 16.04 LTS server, and there was nothing I could do to get the Python 2.7-dependent Ansible version to correctly verify a TLS certificate from our internal CA. When I switched to running Ansible through Python 3, the exact same config worked flawlessly. The original code has been updated to reflect this state of things.

Enable CredSSP WinRM communications in Windows

Our production domain has a local Certificate Authority, which simplifies some operations. All domain members request their computer certificates from this CA, and the resulting certs have subject lines matching their hostname. The following PowerShell script will allow us to utilize the existing certificates to secure WinRM communications, along with enabling the necessary listener and firewall rules.

$hostname=hostname
# Get the thumbprint of the latest valid machine certificate
$cert=Get-ChildItem -Path cert:\LocalMachine\My -Recurse|? { ($_.Subject -match $hostname) -and ($_.NotAfter -gt $today.date) } | sort { $_.NotAfter } | select -last 1
# Enable Windows Remote Management over CredSSP
Enable-WSManCredSSP -Role Server -Force
# Set up an HTTPS listener with the machine certificate’s thumbprint
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $cert.Thumbprint -Force
# Allow WinRM HTTPS traffic through the firewall
New-NetFirewallRule -DisplayName 'Windows Remote Management (HTTPS-In)' -Name 'Windows Remote Management (HTTPS-In)' -Direction Inbound -Protocol TCP -LocalPort 5986 -RemoteAddress LocalSubnet

Depending on your desired security level you may want to change the RemoteAddress property of the firewall rule to only allow management traffic from a single host or similar. It is a bad idea to allow remote management from untrusted networks!

Enable CredSSP WinRM communications from Ansible

To enable Ansible to use CredSSP on an Ubuntu server, we’ll install a couple of packages:

sudo apt install libssl-dev
pip3 install pyOpenSSL
pip3 install pywinrm[credssp]

We then need to ensure that the Ansible server trusts the certificates of any Windows servers:

sudo chown root our-ca.crt
sudo chmod 744 our-ca.crt
sudo mv our-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates

And finally we’ll tell Ansible how to connect to our Windows servers – including where to find the CA-file – by adding the following to the group_vars for the server group:

ansible_user: "username@domain.tld"
ansible_password: "YourExcellentPasswordHere"
ansible_connection: winrm
ansible_port: 5986
ansible_winrm_transport: credssp
ansible_winrm_ca_trust_path: /etc/ssl/certs

Naturally, if we’re storing credentials in a file, it should be protected as an Ansible vault.

Finally we can try our config out. Note, as mentioned in the beginning of this article, that I had to resort to running Ansible through Python3 to correctly validate my CA cert. It’s time to get with the times, folks.. 🙂

python3 $(which ansible) windowsserver.domain.tld --ask-vault-pass -m win_ping
Vault password: 
windowsserver.domain.tld | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

To ensure that playbooks targeting Windows servers run using Python3, add the following to the Windows server group_vars:

ansible_python_interpreter: /usr/bin/python3  

Happy server management!