Managing Windows servers with Ansible

Although I to a large degree get to play with the fun stuff at work, much of our environment still consists of Windows servers, and that will not be changing for a long time. As I’ve mentioned in earlier posts, I try to script my way around singular Windows servers using Powershell whenever it makes sense, but when a set of changes needs to be performed across groups of servers – especially if it’s something recurring – my tool of choice really is Ansible.

The Ansible management server (which has to be running a Unix-like system) needs to be able to communicate securely with the Windows hosts. WinRM, which is the framework used under the hood, allows for a number of protocols for user authentication and transfer of commands. I personally like to have my communications TLS secured, and so I’ve opted for using CredSSP which defaults to an HTTPS-based communications channel.

A huge gotcha: I tried running the tasks below from a Ubuntu 16.04 LTS server, and there was nothing I could do to get the Python 2.7-dependent Ansible version to correctly verify a self-signed TLS certificate. When I switched to running Ansible through Python 3, the exact same config worked flawlessly. The original code has been updated to reflect this state of things.

Enable CredSSP WinRM communications in Windows

Our production domain has a local Certificate Authority, which simplifies some operations. All domain members request their computer certificates from this CA, and the resulting certs have subject lines matching their hostname. The following PowerShell script will allow us to utilize the existing certificates to secure WinRM communications, along with enabling the necessary listener and firewall rules.

$hostname=hostname
# Get the thumbprint of the machine certificate (TODO: Don’t assume there’s only one).
$cert=Get-ChildItem -Path cert:\LocalMachine\My -Recurse | ? { $_.Subject -match $hostname } | Select-Object -Property Thumbprint
# Enable Windows Remote Management over CredSSP
Enable-WSManCredSSP -Role Server -Force
# Set up an HTTPS listener with the machine certificate’s thumbprint
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $cert.Thumbprint -Force
# Allow WinRM HTTPS traffic through the firewall
New-NetFirewallRule -DisplayName 'Windows Remote Management (HTTPS-In)' -Name 'Windows Remote Management (HTTPS-In)' -Direction Inbound -Protocol TCP -LocalPort 5986 -RemoteAddress LocalSubnet

Depending on your desired security level you may want to change the RemoteAddress property of the firewall rule to only allow management traffic from a single host or similar. It is a bad idea to allow remote management from untrusted networks!

Enable CredSSP WinRM communications from Ansible

To enable Ansible to use CredSSP on an Ubuntu server, we’ll install a couple of packages:

sudo apt install libssl-dev
pip3 install pyOpenSSL
pip3 install pywinrm[credssp]

We then need to ensure that the Ansible server trusts the certificates of any Windows servers:

sudo chown root our-ca.crt
sudo chmod 744 our-ca.crt
sudo mv our-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates

And finally we’ll tell Ansible how to connect to our Windows servers – including where to find the CA-file – by adding the following to the group_vars for the server group:

ansible_user: "username@domain.tld"
ansible_password: "YourExcellentPasswordHere"
ansible_connection: winrm
ansible_port: 5986
ansible_winrm_transport: credssp
ansible_winrm_ca_trust_path: /etc/ssl/certs

Naturally, if we’re storing credentials in a file, it should be protected as an Ansible vault.

Finally we can try our config out. Note, as mentioned in the beginning of this article, that I had to resort to running Ansible through Python3 to correctly validate my CA cert. It’s time to get with the times, folks.. 🙂

python3 $(which ansible) windowsserver.domain.tld --ask-vault-pass -m win_ping
Vault password: 
windowsserver.domain.tld | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

To ensure that playbooks targeting Windows servers run using Python3, add the following to the Windows server group_vars:

ansible_python_interpreter: /usr/bin/python3  

Happy server management!

Simple DMARC report parsing and visualizing toolkit

Just a short post to recommend techsneeze‘s tools for downloading, parsing, and displaying DMARC reports. I’m not exactly a Perl expert, so it took me a few minutes to install the necessary modules to get the scripts working, but after that I am a happy camper.

On that note: “I was today years old when I realized the usefulness of apt-file in Debian-based distros.”

The web reporting tool should not be presented outside of a secured network, but at first glance it seems to do exactly what it sets out to do, in visualizing SPF and DKIM failures.

File system rights on mounted drives in Windows

As I repeatedly state, the same object oriented design that makes PowerShell potentially powerful in complex tasks, also makes it require ridiculous verbosity on our part to make it accomplish simple ones. Today’s post is a perfect example.

Consider a volume mounted to an NTFS mountpoint in a directory. Since this is an obvious afterthought in the file system design, setting access rights on the mountpoint directory won’t do you any good if you expect these rights to propagate down through the mounted file system. While the reason may be obvious once you think about the limitations in the design, it certainly breaks the principle of least astonishment. The correct way to set permissions on such a volume is to configure the proper ACL on the partition object itself.

In the legacy Computer Management MMC-based interface, this was simply a matter of right-clicking in the Disk Management module to change the drive properties, and then setting the correct values in the Security tab. In PowerShell, however, this isn’t a simple command, but a script with three main components:

  • Populate an ACL object with the partition object’s current security settings
  • Modify the properties of the ACL object
  • Commit the contents of the ACL object back into the partition object

Here’s how it’s done:

First we need to find the volume identifier. For this we can use get-partition | fl, optionally modified with a where, or ?, query, if we know additional details that can help narrow the search. What we’re looking for is something looking like the following example in our DiskPath property:

\\?\Volume{f0e7b028-8f53-42fa-952b-dc3e01c161d8}

Armed with that we can now fill an object with the ACL for our volume:

$acl = [io.directory]::GetAccessControl("\\?\Volume{f0e7b028-8f53-42fa-952b-dc3e01c161d8}\")

We then create a new access control entry (ACE):

$newace = New-Object -TypeName System.Security.AccessControl.FileSystemAccessRule -ArgumentList "DOMAIN\testuser", "ReadAndExecute, Traverse",
 "ContainerInherit, ObjectInherit", "None", "Allow"

The reason we must enter data in this order is because of the definition of the constructor for the access control entry object. There’s really no way of understanding this from within the interactive scripting environment; you just have to have a bunch of patience and read dry documentation, or learn from code snippets found through searching the web.

The next step is to load our new ACE into the ACL object:

$acl.SetAccessRule($newace)

What if we want to remove rights – for example the usually present Everyone entry? In that case we need to find every ACE referencing that user or group in our ACL, and remove it:

$acl.access | ?{$_.IdentityReference.Value -eq "Everyone"} | ForEach-Object { $acl.RemoveAccessRule($_)}

If we’ve done this job interactively, we can take a final look at our ACL to confirm it still looks sane by running $acl | fl.

Finally we’ll commit the ACL into the file system again:

[io.directory]::SetAccessControl("\\?\Volume{f0e7b028-8f53-42fa-952b-dc3e01c161d8}\",$acl)

And there we go: We’ve basically had to write an entire little program to make it, and the poor inventors of the KISS principle and of the principle of least astonishment are slowly rotating like rotisserie chickens in their graves, but we’ve managed to set permissions on a mounted NTFS volume through PowerShell.

NTFS mount points via PowerShell

As I mentioned in an earlier post, it’s sometimes useful to mount an additional drive in a directory on an existing drive, Unix-style, rather than presenting it with its own traditional Windows-style drive letter.

Here’s how we do it in PowerShell:

If the volume is already mounted to a drive letter, we need to find the disk number and partition number of the letter:

Get-Partition | select DriveLetter, DiskNumber, PartitionNumber | ft

DriveLetter DiskNumber PartitionNumber
----------- ---------- ---------------
                     0               1
          C          0               2
                     1               1
          E          1               2
                     2               1
          F          2               2
                     3               1
          G          3               2

In this example, we see that volume G corresponds to DiskNumber 3, PartitionNumber 2.

Let’s say we want to mount that disk under E:\SharedFiles\Mountpoint. First we need to make sure the directory exists. Then we’ll run the following commands:

Add-PartitionAccessPath -DiskNumber 3 -PartitionNumber 2 -AccessPath 'E:\SharedFiles\Mountpoint\'
Remove-PartitionAccessPath -DiskNumber 3 -PartitionNumber 2 -AccessPath 'G:\'

Summary

As usual, PowerShell is kind of “wordy”, but we do get our things done.

Creating a working Ubuntu 18.04 VMware template

Long story short: I use VMware and I use Ubuntu. With Ubuntu 16.04 everything worked nicely out of the box. With Ubuntu 18.04 it doesn’t. I finally got tired of manually setting my hostname and network settings every time I need a new server, and decided to fix my template once and for all.

Networking

The first thing that doesn’t work – as mentioned in an earlier post – is deploy-time configuration of the network based on vCenter templates.

For some weird reason, Ubuntu has chosen to entirely replace the old ifupdown system for configuring the network with a combination of Cloud-init and Netplan. If we choose to download the installation image with the traditional installer, at least we don’t get cloud-init, but Netplan remains.

False start

According to the Netplan FAQ, we can install Ubuntu Server without using Netplan by pressing F6 followed by ‘e’ in the installer boot menu, and adding netcfg/do_not_use_netplan=true to the preseed command line.

Unfortunately this leaves us with a disconnected machine after first boot: It turns out Ubuntu isn’t smart enough to actually install ifupdown if netplan is deselected – at least not using the current installer, 18.04.01.

The working way

The solution to the problem above is still (in February 2019) to perform a clean install with Netplan, and then manually remove open-vm-tools and replace it with VMware’s official tools, since open-vm-tools do not yet support Ubuntu’s weirdness even 10 months after 18.04 was released.

…However…

The default DHCP behavior in Ubuntu 18.04 is nothing other than idiotic for use in VMware templates: Despite newly deployed machines naturally getting new MAC addresses, they insist on asking to be handed the same IP address as their template, and they naturally don’t understand if the lease is already taken but will keep stealing the IP address from each other.

Fortunately, according to this post over at superuser.com, there’s a way to fix this. Edit /etc/netplan/01-netcfg.yaml, and tell Netplan to use the MAC address as the DHCP identifier, like this:

      dhcp4: yes
      dhcp-identifier: mac

After this, new machines deployed from the template should behave slightly more sanely.

Painfully long Grub menu timeout

Grub’s boot menu has a default timeout of 30 seconds in Ubuntu 18.04. The relevant setting is apparently modifiable in /etc/default/grub. Only it isn’t. The default value for GRUB_TIMEOUT is 2 seconds, which it doesn’t adhere to at all. Logically (no, not at all), the “fix” is to add the following line to /etc/default/grub:

GRUB_RECORDFAIL_TIMEOUT=2

Re-run update-grub with superuser rights, and reboot the computer to confirm it worked as intended.

End result

With the changes detailed above, and after installing Python to allow Ansible to perform its magic on VMs deployed from this template, I finally have reached feature parity with my Ubuntu 16.04 template.

Rescuing vVol-based virtual machines

Background

As mentioned in a previous post, I had a really bad experience with vVols presented from IBM storage. Anyhow, the machines must be migrated to other storage, and reading how vVols work, that’s a scary prospect.

The good thing: Thanks to Veeam, I have excellent backups.

The bad thing: Since they’re dependent on the system’s ability to make snapshots, I only have backups up until my vVols failed. Troubleshooting, identifying the underlying issue, having VMware look at the systems and point at IBM, and finally realizing IBM won’t touch my issue unless I sign a year’s worth of software support agreements took several days, during which I’ve had no new backups for the affected VMs.

Fortunately, most of the systems I had hosted on the failed storage volumes were either more or less static, or stored data on machines on regular LUNs or vSAN.

The Three Trials Methods

Veeam restore

Templates and turned off-machines were marked as Inaccessible in the vCenter console. Since they had definitely seen no changes since the vVol storage broke down, I simply restored them to other datastores from the latest available backup.

VMware Converter

I attempted to use a Standalone VMware Converter to migrate a Ubuntu VM, but for some reason it kept having kernel panics on boot time. I suspect it may have something to do with the fact that Converter demands that the paravirtual SCSI controller is replaced with the emulated LSI one. I have yet to try with a Windows server, but my initial tests made me decide to only use Converter as an extra backup.

Cold migration

This is one method I was surprised worked, and which simplified things a lot. It turns out that – at least with the specific malfunction I experienced – turning off a VM that has been running doesn’t actually make it inaccessible to vCenter. And since a turned off VM doesn’t require the creation of snapshots to allow migration, moving it to accessible storage was a breeze. This is what I ended up doing with most of the machines.

Summary

It turns out that at least for my purposes, the vVols system decided to ”fail safe”, relatively speaking, allowing for cold migration of all machines that had been running when the management layer failed. I had a bit of a scare when the cold migration of a huge server failed due to a corrupt snapshot, but a subsequent retry where I moved the machine to a faster datastore succeeded, meaning I did not have to worry about restoring data from other copies of the machine.

Why I’m moving away from vVols on IBM SVC storage

Virtual Volumes, or vVols, sound like a pretty nice idea: We present a pool of storage to vCenter, which in turn gets control of storage events within that pool via something called VASA providers. Benefits of this include the following:

  • vVols allow for policy-based storage assignment.
  • We get to use an “inverted” snapshotting method, where snapshot deletions (i.e. commits of snapshotted data), which are most commonplace are almost instantaneous, at the cost of more expensive rollbacks.
  • vCenter gets access to internal procedures in the storage solution instead of having to issue regular SCSI commands to the controllers.

As presented by VMware, the solution should be pretty robust: The VASA providers present an out-of-band configuration interface to vCenter, while the actual data channel is completely independent of them. As recommended by VMware, the VASA providers by themselves should also be stateless, meaning that in case of total loss of them, recovering should only be a matter of deploying new ones, which should read metadata about the storage from the storage itself and present it back to vCenter.

So what’s the drawback?

If your VASA providers are offline, you can’t make changes to vVol storage, and any vVol-based VMs that aren’t actively running become unavailable. Not being able to make changes to vVol storage is a pretty big deal, because guess what: Snapshots are a vVol storage change. And snapshots are pretty much a requirement for VM backups, which for any production environment is a daily recurring task.

I’ve been presenting vVols from our V9000 and V7000 storage solutions via IBM Spectrum Control Base Edition for quite some time now, and have really liked it. Except when it stopped working. Because it did. Several times. Firmware update on the SAN? Spectrum Control stopped working. HA failover between Spectrum Control nodes? Not reliable. Updates to the operating system on a Spectrum Control node? At least once I couldn’t get the node back online, and had to restore a VM backup. And right now I’m having an issue where some necessary metadata string apparently contains untranslatable unicode characters because someone – possibly even me – used the Swedish letters å, ä, and Ä somewhere without thinking.

I’ve opened a case with IBM support to get things running again, and as soon as I have, I’m migrating everything off of my vVols on SVC, and replacing those storage pools with regular LUNs. From now on I’m sticking to vSAN when I want the benefits of modern object storage for my virtualization environment.

PowerShell for Unix nerds

(This post was inspired by a question on ServerFault)

Windows has had an increasingly useful scripting language since 2006 in PowerShell. Since Microsoft apparently fell in love with backend developers a while back, they’ve even ported the core of it to GNU/Linux and macOS. This is actually a big deal for us who prefer our workstations to run Unix but have Windows servers to manage on a regular basis.

Coming from a background in Unix shell scripting, how do we approach the PowerShell mindset? Theoretically it’s simple to say that Unix shells are string-based while PowerShell is object oriented, but what does that mean in practice? Let me try to present a concrete example to illustrate the difference in philosophy between the two worlds.

We will parse some system logs on an Ubuntu server and on a Windows server respectively to get a feel for each system.

Task 1, Ubuntu

The first task we shall accomplish is to find events that reoccur between 04:00 and 04:30 every morning.

In Ubuntu, logs are regular text files. Each line clearly consists of predefined fields delimited by space characters. Each line starts with a timestamp with the date followed by the time in hh:mm:ss format. We can find anything that happens during the hour “04” of any day in our retention period with a naïve grep for ” 04:”:

zgrep " 04:" /var/log/syslog*

(Note that I use zgrep to also analyze the archived, rotated log files.)

On a busy server, this particular search results in twice as much data to sift through as we originally wanted. Let’s complement our commands with some simple regular expressions to filter the results:

zgrep " 04:[0-2][0-9]:[0-5][0-9]" /var/log/syslog*

Mission accomplished: We’re seeing all system log events between 04:00:00 and 04:29:59 for each day stored in our log retention period. To clarify the command, each bracket represents one position in our search string and defines the valid characters for this specific position.

Bonus knowledge:
[0-9] can be substituted with \d, which translates into “any digit”. I used the longer form here for clarity.

Task 2, Ubuntu

Now let’s identify the process that triggered each event. We’ll look at a line from the output of the last command to get a feeling for how to parse it:

/var/log/syslog.7.gz:Jan 23 04:17:36 lbmail1 haproxy[12916]: xx.xxx.xx.xxx:39922 [23/Jan/2019:04:08:36.405] ft_rest_tls~

This can be translated into a general form:

<filename>:<MMM DD hh:mm:ss> <hostname> <procname[procID]>: <message>

Let’s say we want to filter the output from the previous command and only see the process information and message. Since everything is a string, we’ll pipe grep to a string manipulation command. This particular job looks like a good use case for GNU cut. With this command we need to define a delimiter, which we know is a space character, and then we need to count spaces in our log file format to see that we’re interested in what corresponds to ”fields” number 5 and 6. The message part of each line, of course, may contain spaces, so once we reach that field we’ll want to show the entire rest of the line. The required command looks like this:

zgrep " 04:[0-2][0-9]:[0-5][0-9]" /var/log/syslog* | cut -d ' ' -f 5,6-

Now let’s do the same in Windows:

Task 1, Windows

Again our task is to find events between 04:00 and 04:30 on any day. As opposed to our Ubuntu server, Windows treats each line in our log as an object, and each field as a property of that object. This means that we will get no results at best and unpredictable results at worst if we treat our log as a searchable mass of text.
Two examples that won’t work:

Wrong answer 1

get-EventLog -LogName System -After 04:00 -Before 04:30

This looks nice, but it implicitly only gives us log events between the given times this day.

Wrong answer 2

get-EventLog -LogName System | Select-String -Pattern "04:[0-2][0-9]:[0-5][0-9]"

Windows can use regular expressions just fine in this context, so that’s not a problem. What’s wrong here is that we’re searching the actual object instance for the pattern; not the contents of the object’s properties.

Right answer

If we remember that Powershell works with objects rather than plain text, the conclusion is that we should be able to query for properties within each line object. Enter the “where” or “?” command:

Get-EventLog -LogName System | ?{$_.TimeGenerated -match "04:[0-2][0-9]:[0-5][0-9]"}

What did we do here? The first few characters after the pipe can be read as “For each line check whether this line’s property “Time Generated” matches…“.

One of the things we “just have to know” to understand what happened here, is that the column name “Time” in the output of the Get-EventLog command doesn’t represent the actual name of the property. Looking at the output of get-eventlog | fl shows us that there’s one property called TimeWritten, and one property called TimeGenerated. We’re naturally looking for the latter one.

This was it for the first task. Now let’s see how we pick up the process and message information in PowerShell.

Task 2, Windows

By looking at the headers from the previous command, we see that we’re probably interested in the Source and Message columns. Let’s try to extract those:

Get-EventLog -LogName System | ?{$_.TimeGenerated -match "04:[0-2][0-9]:[0-5][0-9]"} | ft Source, Message

The only addition here, is that we call the Format-Table cmdlet for each query hit and tell it to include the contents of the Source and the Message properties of the passed object.

Summary

PowerShell is different from traditional Unix shells, and by trying to accomplish a specific task in both we’ve gained some understanding in how they differ:

  • When piping commands together in Unix, we’re sending one command’s string output to be parsed by the next command.
  • When piping cmdlets together in PowerShell, we’re instead sending entire objects with properties and all to the next cmdlet.

Anyone who has tried object oriented programming understands how the latter is potentially powerful, just as anyone who has “gotten” Unix understands how the former is potentially powerful. I would argue that it’s easier for a non-developer to learn Unix than to learn PowerShell, that Unix allows for a more concise syntax than PowerShell, and that Unix shells execute commands faster than PowerShell in many common cases. However I’m glad that there’s actually a useful, first-party scripting language available in Windows.

To get things done in PowerShell is mainly a matter of turning around and working with entire properties (whose values may but needn’t necessarily be strings) rather than with strings directly.

Exporting mail to .PST files using PowerShell

We have one Exchange mailbox that has seen exponential growth with no good workaround to be had. I started worrying when we passed 100 GB for the single box, and by 160 GB users started getting properly annoyed by the performance when browsing the box. I spoke to the manager for the department and suggested exporting the data to a number of archive files that could be manually accessed by select users.

The basic command

As usual, for this kind of task we want to reach for our CLI. The command to copy data to a .PST is New-MailboxExportRequest. A typical use case for a small box could look something like this:

New-MailboxExportRequest -Mailbox "MailboxName" -Name JobIdentifier -FilePath \\servername\path\filename.pst

Two notes:
1) The Name of the job is how we may reference it later on. It doesn’t have to make sense, but in case we create multiple jobs our lives get easier if they have sane names.
2) The FilePath argument must point at a UNC share where the domain group Exchange Trusted Subsystem has Modify rights. Avoid writing to a system or database volume directly on the mail server, or bad things may happen when the volume fills up.

Advanced parameters

We will undoubtedly want to create more advanced queries, though. In the case of my giant mailbox, it makes sense to split the contents chronologically. The ContentFilter parameter can be made advanced enough if we want to:

New-MailboxExportRequest -ContentFilter {(((Sent -gt '2016-08-01') -and (Sent -lt '2016-10-01')) -or ((Received -gt '2016-08-01') -and
(Received -lt '2016-10-01')))} -Mailbox "MailboxName" -Name MailboxAugToSep2016dump -IsArchive -FilePath \\servername\path\filename.pst

To break the command down:
1) We want to filter for both the Sent and the Received properties of our mails, since we want to catch not only mails that were received during the period, but also outbound mail from the time.
2) Since we use the “greater than” and “less than” parameters, it’s good to know how they work in the case of dates: When written in this way, what we’re actually putting in is “the date, at 00:00”. In other words the -gt switch will pick up the entire day of the date entered, but to catch the entire last day of a month with the -lt switch, we must enter the first day of the following month. In this case we’ll dump everything up until 00:00 on October the 1st, which is exactly what we want.
3) In this case I added the IsArchive directive. This tells the command to look in the online archive belonging to the mailbox instead of in the actual mailbox.

Checking job status

We can check if an export request is queued, in progress, completed, or failed, simply by running Get-MailboxExportRequest. But as usual we can get some truly useful information by stringing some commands together. Why did a job fail? Did the supposedly “completed” job parse our command line the way we expected it to?

Get-MailboxExportRequest -status Completed | Get-MailboxExportRequestStatistics -IncludeReport | fl > c:\report.txt

Here we first ask for a list of Completed jobs – but we might as well ask for failed jobs. Then we dump a pretty verbose list of the report(s) to a regular text file for easy reading.

Clearing jobs

Once we’re confident we know what we need to know about our completed or failed jobs, we can use the Remove-MailboxExportRequest cmdlet to clear our job list. Combine it with Get-MailboxExportRequest to clear many at a time. For example:

Get-MailboxExportRequest -status Completed | Remove-MailboxExportRequest

This will simply remove all jobs with the status Completed.

References

New-MailboxExportRequest
Get-MailboxExportRequestStatistics


Workaround for broken connection management in Exchange

For legacy reasons (don’t even ask…) we still have an old NLB-based Exchange 2010 mail server farm, with a CASArray consisting of two servers, in front of a DAG cluster at work.

The interesting thing, of course, is when one of the CAS’s fail, Outlook clients don’t automatically start using the other CAS as you’d expect in a sane system. But which Outlook clients didn’t keep working seemed to be somewhat arbitrary.

A couple of minutes with my preferred search engine gave me the tools to show what’s wrong:

Get-Mailboxdatabase | ft Identity, RpcClientAccessServer

Identity RpcClientAccessServer
-------- ---------------------
Mailbox DB05 CAS1.tld
Mailbox DB03 CAS2.tld
...

The above example output shows that each database has a preferred CAS, and explains the apparent arbitrariness of clients refusing to connect to the remaining CAS.

The funny thing is that even after an hour and a half and way after NLB Manager stopped presenting the second CAS in its GUI, Exchange hadn’t understood that one of the members of the CASArray was down. The workaround is to manually tell each datastore to use the healthy CAS:

Set-MailboxDatabase "Mailbox DB03" -RPCClientAccessServer CAS1.tld

Get-Mailboxdatabase | ft Identity, RpcClientAccessServer


Identity RpcClientAccessServer
-------- ---------------------
Mailbox DB05 CAS1.tld
Mailbox DB03 CAS1.tld
...

Fortunately it looks as though modern Exchange solutions with real load balancers in front of them don’t experience this issue.