VMware Storage Providers and Certificate issues

While trying to test out vVols in our vSphere 6.5 environment, presented via IBM Spectrum Control Base 3.2 from a StoreWize V9000 SAN, I ran into a small issue that took me a while to figure out:

I installed Spectrum Control Base 3.2 and presented its web services via a FQDN.
To avoid the nagging of modern browsers, I used a regular wildcard certificate valid for the domain I chose to use.
After the initial setup, when I tried to add SCB as a storage provider in VMware, I got the following error message: “A problem was encountered while provisioning a VMware Certificate Authority (VMCA) signed certificate for the provider.
A web search showed me that this was a pretty common problem with several VASA providers, but none of the suggested solutions applied to our environment. After half an hour of skimming forums and documentation I found the following quote in an ancient support document from VMware:
Note: VMware does not support the use of wildcard certificates.

So: I generated a self-signed certificate in the Spectrum Control Base server webUI, and the problem disappeared.

Lesson of today: We don’t use wildcard certificates in a VMware service context.

Musings on the Apple Watch ’Series 0’ in 2018

I bought my Apple Watch pretty much exactly two years ago, after getting a good deal on someone’s returned Christmas gift; a Space Grey 42 mm Aluminum watch.

The ”killer feature” for me was the ability to see and even receive phone calls without having the phone directly on hand. The iPhone 5 I had at the time never supported more than 3G networks in Sweden, and 3G coverage is bad where I live. This means that I had a small number of spots with guaranteed coverage at home, but at the same time I was on call duty at work every few weeks. With the Apple Watch, I could leave my phone in a spot with known good reception, and walk around the house without fear of accidentally missing a call.

So how does it fare two years and three operating system updates later?
Functionally, I haven’t a lot to complain about. I think I’m noticing some degradation in battery life, but it’s not concrete enough for me to judge whether it’s an issue with the new OS or actual battery wear.
What has begun bothering me, though, is the noticeably lower responsiveness of the watch with the latest operating system. Asking Siri, the virtual assistant, something has become a frustrating exercise of attempting to wake her by voice, and then giving up and invoking the function using the side button, and then waiting forever for confirmation.

I’ve grown too fond of having this little computer on my wrist not to upgrade in the future, but given that watchOS seems to suffer from a very similar yearly increase in non-optional bloat as iOS, a new watch will likely not have a usable lifetime of more than 3-4 years, which is a shame, really. I’ve been lusting for a Space Black stainless steel Apple Watch, but realistically I won’t spend chronograph money on a disposable doohickey that will likely have lost a significant part of its usefulness in 36 months.

Test whether a git pull is needed from within a batch script

Just a quick hack I did to avoid having to sync a couple of scripts unnecessarily when deploying my load balancers. Underlying idea stolen from a post by Neil Mayhew on Stackoverflow.

Shell session script:

Ansible playbook:

Environment variables for services in a systemd-based system

My current config deployment automation project has required me to set up a dev/staging environment for my load balancers, since I don’t want to break stuff by deploying untested configurations.
This environment is functionally identical to a single load balancer and can be used along with a hosts file on a client to not only develop configurations and make sure they are syntactically correct, but also to fully test the functionality of the load balancer rules.

As part of this, I naturally need to change the listener addresses in the dev/staging HAProxy environment compared to the production environment that I keep version controlled in my git repository.
My first instinct was to use a script to pull the latest versions, modify the necessary lines using sed, and copy the config to the correct location. I didn’t really like this concept since it would by definition mean that the configs weren’t fully identical between the production environment+git repo and the dev/staging environment.
If I used environment variables, the version controlled configuration could be kept fully identical across all instances.

The first mistake I made took me a while to grasp. HAProxy parsed an obviously fully valid configuration, but intermittently presented a certificate I didn’t expect, and web services intermittently failed to reply to requests.
It turns out Linux daemons don’t inherit even system-wide environment variables.
So how do we check what environment variables a service does see?
First get the PID(s) of the service:

In the case of my HAProxy installation, I got a list of three processes, so I chose the last one and checked out its environment:

This results in a list of its current environment variables, and naturally the ones I thought I’d added were nowhere to be seen.

So why did HAProxy start without complaining? Naturally since the environment variables weren’t defined in this context, their implicit value was NULL, and so HAProxy figured I wanted to listen on the assigned ports on all interfaces.

How do we assign environment variables to a service in a modern, systemd-based Linux, then?
On a command-prompt, run systemctl edit servicename. This starts your default editor. A valid config looks like this:

On Ubuntu, this file is stored in /etc/systemd/system/servicename.service.d/override.conf, but naturally this file can be renamed to something more descriptive. The systemctl edit command doesn’t do anything magical, it’s just a shortcut.
After the file is in place, run systemctl daemon-reload to make the configuration active, and then the HAProxy service needs to be restarted, not only reloaded, for the changes to apply.

Of course, I want this config too to be deployable through Ansible.
The relevant lines from my playbook:

Key lines:
The register line stores the result of the command to a variable. Thanks to that, I can use the when keyword to only reload daemons when anything has actually changed.

Summary

Linux daemons don’t automatically inherent environment variables.
In systemd-based distros (which today means pretty much anyone with corporate backing), environment variables can be added using the Environment keyword in the Service section of a file in /etc/systemd/system/servicename.service.d/.

Continuous Deployment of Load Balancer Configurations

I thought I’d describe some optimizations I’ve made to my load balancers at work, both for the good of the older me, and in case someone would benefit from some of my ideas.

Background

The load balancers are based on four software packages that integrate to create a powerful whole:
Keepalive Daemon provides a common set of virtual IP addresses and ensures that failover happens to a Backup server if the Master would cease responding.
HAProxy does most of the actual load balancing and mangles network traffic when required.
SNMPD throws SNMP trap events from keepalived whenever a failover occurs.
The Zabbix Agent enumerates current configuration and system state for detailed system monitoring.

Now, all of these components get the occasional configuration change, except for HAProxy, which pretty much sees changes on at least a weekly basis.
The procedure for updating the configuration must cover the following steps:

  1. Run a pre-check to confirm that both load balancers in the pair work; we don’t want to initiate an automated update that could kill off service availability completely.
    On the Backup load balancer node:
  2. Backup the current configuration.
  3. Deploy the new configuration.
  4. Reload services.
  5. Run a post-op check on the secondary node to confirm that the new config hasn’t broken anything important.
  6. Fail over operations from the Master load balancer node to the Backup node and repeat steps 2-5 on the Master node.
  7. Perform a final check on the load balanced services to confirm functionality hasn’t been lost.

From experience, this procedure is tedious to say the least. In addition there’s always the risk of introducing a change to an active load balancer and forgetting to deploy the same change to the backup one; something that may not become obvious until after the next major configuration update when the last change disappears and functionality breaks.

These are just the most obvious arguments for an automated and version controlled deployment procedure. So how do we go about that?

Version control

In my case, I use Git connected to a GitLab server for version control, and Ansible for automation.

Configuration changes are prepared in a development environment, from which the relevant files are committed to a git repository.

Other components in the load balancer config – Lua scripts or tools made by our developers are stored in other repositories, and can be pulled by git before a new deployment.

Ansible structure

For each load balancer pair, I’ve built a directory structure containing a playbook directory for the Ansible YAML scripts, and a filesystem directory that mirrors the movable parts of the load balancer, where the relevant parts exist in the etc directory tree.

Automation

Deployment is initialized by a shell script that git-pulls the latest versions of dependencies we have and then ensures that the Ansible playbooks can work on remote computers by wrapping them in an ssh-agent environment.
The execution of Ansible playbooks happens from within a session script called by the ssh-agent.

Ansible-specific tips

The key to ensuring that the production environment doesn’t break lies in the header of the playbook:

The serial keyword makes the script work on one server at a time rather than executing in parallel.
The any_errors_fatal parameter is combined with relevant service checks interspersed among the deployment tasks to ensure that the script fails fast and loudly if a backend web service stops responding while deployment is underway, so that we don’t break both servers in a pair. Note that this requires some thought on the part of the person running the scripts, so they fix the problem before re-attempting to run the script, or fecal matter will hit the fan quickly enough.

The most basic of tests just ensures I can reach the statistics page of my load balancer:

A typical file copying task:

As a side note: Since I don’t want the script to have to care about which server is which, I’ve created one config file for the keepalived master instance and one for the slave. On the actual servers, a symlink points to the correct configuration for the instance.

By reloading the HAProxy service, existing sessions are not lost even though the configuration gets updated. As a bonus, in the Ansible service module, the reloaded state request also starts the service if it wasn’t started before.

With way less than a day’s worth of work, a workflow has been introduced for the deployment process that is repeatable and that mitigates some of the risks involved in letting humans tamper with production systems.

Load balancing a Citrix StoreFront

This is well-documented by Citrix too; I just thought I’d write a few lines on a gotcha I happened upon:
Load balancing the Citrix StoreFront has two prerequisites:

  1. The StoreFront servers must see the load balancer VIP when looking up the service FQDN.
  2. The load balancers must be configured with session affinity, since it doesn’t seem as though StoreFront keeps track of user sessions between StoreFront servers.

In addition, Citrix recommends running a load balancing scheme based on least connections.

In HAProxy, a valid backend config looks something like this:

Default behavior should not be stupid

I ran into a problem that truly surprised me yesterday:
One of our remote desktop users accidentally deleted an entire folder of customer data. He was seriously angry, and rightfully so: He had pressed the Delete key on the keyboard at the wrong moment, after which the folder and its entire substructure was removed from disk without any need for confirmation.
As it turns out, this is default behavior in Windows, from Windows 8 on the client side and Windows Server 2012 on the server side. On local volumes on a regular PC, what usually happens is that the chosen documents are moved to the Recycle Bin without warning, which isn’t all too bad unless it happens to the kind of user who doesn’t always know what they’re doing.
The problem is that the same thing happens on network drives, no matter if they have Recycle Bin or Shadow Copies activated or not.

We do have working backups, so this only resulted in a few minutes of frustration for the user, but I can’t believe that this new default behavior passed testing at Microsoft, and I have no idea how they motivated the change.

The old and sane behavior with a nice and friendly dialog telling you you’re about to lose a month’s worth of work, and do you really want that, can be restored via group policy:
User Configuration -> Administrative Templates -> Windows Components -> Windows Explorer / File Explorer -> Display confirmation dialog when deleting files.

Exchange – another lesson learned

This is why we test things before going live:
After migrating a test box from the old Exchange environment, it could receive mail just fine, and sending mail out of the organization worked flawlessly too. Unfortunately any mail sent from this account to recipients within the old Exchange environment got stuck in the mail queue.

Logically as usual, the fix was to complement the default receive connectors on the old servers with the explicit addresses of the new Exchange servers, even though they naturally were well within the 0.0.0.0-255.255.255.255 range. Way to go, Microsoft!

Load Balancing Exchange 2016 behind HAProxy

I recently started the upgrade to Exchange 2016 at work. A huge benefit over Exchange 2010, is that REST based client connections are truly stateless. In effect this means that if a server goes down, clients shouldn’t really notice any issues as long as something redirects them to a working server. In my system, this something is HAProxy.

The guys at HAProxy have their own excellent walkthroughs for setting up their load balancer for Exchange 2013, which can pretty much be lifted verbatim to Exchange 2016, but I want to add a few key points to think about:

Service health checks

Each web service has a virtual file to tell its state, called HealthCheck.htm. Let HAProxy use the contents of this file for the server health check. That way it’ll know to redirect clients if one of the services is down, even though the Exchange server in question may still be listening on port 443.

Example config:

This example shows a test of the Outlook Web Access service state. Naturally the config can be set to test each of the REST services each Exchange server presents.

Exchange server default firewall rules

Our design puts the load balancer in a DMZ outside of our server networks. Clients connecting through the load balancer will be dropped by Windows firewall rules generated by Exchange; specifically the edge traversal rules for the POP3 and IMAP protocols. Make sure you allow edge traversal for these protocols, letting the network firewall take care of limiting external client connections to them. Also take note there are multiple firewall rules for IMAP and POP3 traffic. Only the ones concerned with client traffic are relevant for this change. There’s no point in kicking open holes in your firewall for no good reason.

Exchange and Outlook suck at IMAP

We use IMAP for an internal order management system. Outlook and Exchange aren’t the best tools for this protocol, but unfortunately we have to live with those due to sins committed long ago. I spent quite some time troubleshooting our IMAP connections:
No matter how I configured Outlook I couldn’t get it to open an IMAP connection to the Exchange servers. Error messages varied depending on the client settings, but essentially I couldn’t log on, couldn’t establish a secure connection, or couldn’t synchronize my folders.

I would get the regular banner when telnetting from the client machine, so I knew traffic was getting through all the way from Exchange via the load balancer.
Mozilla Thunderbird could connect perfectly well and sync accounts, both using STARTTLS on port 143 and over a TLS encrypted connection on port 993. After mulling it over, I turned on debug logging in Outlook and quickly saw that the client was trying and failing to perform an NTLM logon to Exchange. Using the error messages as search terms, I found others who had experienced the same issue. Their solution had been to turn off NTLM authentication for the IMAP protocol on the Exchange server. This seems to be a regression in Exchange Server 2016 from an earlier bug in Exchange 2013.
The command in the Exchange Management Shell:

After this, Outlook still is incapable of logging on using TLS over port 993, but at least it consistently manages to run STARTTLS over port 143, which is good enough for my use case.

All in all, the most complicated part here wasn’t to make HAProxy do its magic, but to get Exchange and Outlook do what they should.

Apple AirPods first impressions

I’ve had the Apple AirPods for a few days now, and thought I’d record a few of my thoughts on them.

EDIT 2017-07-17: I’ve added an update to the end of this article.
EDIT 2017-07-20: A second update has been added after going through the rounds with Apple’s support.

First of all, the sound quality: Wow.

What I liked about the regular EarPods, was that they let other sounds through, making wearing them in populated areas non-suicidal in comparison to wearing in-ear headphones with a better seal: I regularly shake my head at pedestrians and cyclists wearing in-ears and obviously having great faith in surrounding traffic as they step right out into zebra-crossings or high-traffic streets. Unfortunately, in the case of the original wired EarPods, this added situational awareness came at the cost of radically reduced “oomph” in the music: Bass and dynamic range seemed to suffer in anything but rather quiet environments.

While the AirPods have a pretty much identical fit, letting similar amounts of ambient sounds through, the design team has managed to give them the additional range and power to sound pretty close to as well as I imagine such small drivers can sound in an open design. That said, some magic is very hard to pull off: You won’t be well-off using these without additional protection when mowing the lawn or angle-grinding metal bars.

Technically, I second what most other people seem to be saying about the combination of Bluetooth and the W1 proprietary chip: Switching to sound output via the AirPods once the initial pairing with a device has been made seems to work flawlessly, but both on my Mac and on my iPad, it took a few tries to see the AirPods in the first place. Under the hood, information about your pair of AirPods is shared across your Apple devices using iCloud, and obviously this information needs to be updated in some way. On the Mac, it seems like restarting the computer worked the trick. This is obviously an area where Apple has some work to do, to smooth out the experience in the future.

One thing to be observant on: Enabling the AirPods when playing with GarageBand, you get a warning about the ‘Pods introducing latency to your process. Sure enough: playing with the on-screen keyboard I probably got somewhere between 1/4 and 1/2 second of latency instead of the immediate response I’m used to from Apple music tools, so if music production is something you do on your Apple devices, make sure to keep a wired pair of headphones or in-ears around.

All in all: Are the AirPods worth their price? It depends. Can you spare a bunch of money for a smoother and nicer experience than what’s available via the cheapest available product that solves your problem? If you’re an Apple user, the answer to that question is probably yes. To me, after I got them, I don’t really think about the money I saved up to spend on them. For now I’m extremely happy with them.

UPDATE:
I’ve encountered two major annoyances in how the AirPods work with my Mac (a late 2013 15″ MacBook Pro):
Apparently when anything in MacOS uses the microphones in the AirPods, they switch to phone call mode, lowering sound quality and making all sounds slightly distorted and lo-fi. This can be temporarily mitigated by switching sound input for the system or for the specific application to another device, like the internal mike, but this of course isn’t a viable long-term solution.

The other problem on the Mac is recurring sound interruptions and glitches on music playback. Switching to the internal speakers or wired headphones, no such glitches can be heard, so it definitely has to do with the AirPods or their Bluetooth implementation.

Frankly I’m disappointed that the AirPods were released with such glitches not worked out; then again they did have trouble getting them to market in time in the first place. I will speak to Apple’s support to try to get some more information. It may be a problem with the Bluetooth protocol itself as implemented on the Mac or in macOS, and in that case there may not be a lot Apple can do.

In view of this, I have to change my recommendation:
At this point in time (mid-July 2017), do not purchase the AirPods expecting to use them for good-quality music playback and convenient voice calls in macOS. For use with iOS devices, however, they remain an excellent choice.

UPDATE 2:
Apple’s support gave me a technical explanation for the lo-fi sound quality when the microphone is used in macOS.

The facts

When only listening to the AirPods, Apple can send relatively high-quality AAC-encoded sound to them. When the microphones are used – that is when a return channel is active – the Bluetooth standard specifies a lower-quality protocol to be used, resulting in noticeably lower dynamic range and sound quality.

The problem exists on iOS devices too, but it’s simply less likely that one would be listening to music and simultaneously using the microphone in that system.

My speculations

It looks to me as though my iOS devices (9.7″ iPad Pro, and iPhone 6s) are capable of a newer version of the Bluetooth hands-free profile than does macOS on my 2013 15″ MacBook Pro, since call sound quality is radically better on the former than on the latter. This may be due to the Bluetooth chip used in my computer, or due to software limitations in the operating system. If the former – which I suspect – the issue won’t get fixed on my current computer. If the latter, a patch at a later date may be able to remediate but not solve the issue.

A problem with the age of the Bluetooth chip and its available codecs may also explain the stuttering in macOS.

Conclusion

As I wrote in Update 1 to this post, my recommendations are as follows:
Beware of purchasing a pair of AirPods if you intend on using them primarily with a Mac. They’re probably not worse than other Bluetooth headsets for this purpose, but rather the same problems exist with these as you’ll find with any other Bt headset. If music or voice call quality is an issue, a wired headset still is the way to go on the computer side of things.

For iOS devices and the Apple Watch, however, a pair of AirPods is probably one of the best upgrades to your experience you can get if you want to go wireless.