DNS/DHCP issues in modern Windows versions

Static IP addresses are a solid way to configure machines if you have few enough of them to manage them manually. But the more ability you want to have to change things on the fly, the more limiting such a configuration scheme becomes.

Unfortunately I’ve had severe problems with getting servers with DHCP leases (or even DHCP reservations) to have their names stick in DNS over time. Suddenly, after a reboot a machine would seemingly drop off the network even though it had the same IP address as earlier. Rebooting or manually re-registering its DNS record would solve the problem, but it wasn’t an acceptable solution to the underlying issue.

I found a discussion that gave a few pointers on how to get these things working in Windows, and I’ve shamelessly ripped the relevant points to present them here:

Step one: Define a user in whose context the DHCP server will run

Simply add a domain user with no special rights, and give it a properly strong password. Then open the DHCP management console, right-click the protocol you want to change (IPv4 or IPv6), and select Properties and the Advanced tab. Click Credentials and enter the relevant information for the account.

Step two: Tell DHCP to always attempt to update DNS records

In the same properties window, select the DNS tab. Ensure the following choices are ticked:

Enable DNS Dynamic Updates(…) -> Always dynamically update DNS records
Dynamically update DNS records for DHCP clients that do not request updates

Step three: Ensure DHCP server AD group membership

The DHCP server(s) should exist in the group DNSUpdateProxy. No other user or computer accounts may exist in this group.

Other tips

Make sure DHCP leases are longer than 24 hours, or bad things are likely to happen. A concrete example given is that Microsoft KMS servers have a 24 hour update cycle.

 

FTP server on IIS

I recently had cause to set up an FTP server for internal use on a Windows server, and I bumped into an issue that took me a little while to figure out, since I had never done it before:
I wanted to give a domain user account an isolated directory, and started out by creating the user directory inside the inetpub\ftproot folder. However when I tried to log on with a client, it would I’ve me an error message “530 User cannot log in, home directory inaccessible”.

Turns out the answer was simply to create a subdirectory to the ftproot with the name of the domain, and then move the user directory inside that one.

Default behavior should not be stupid

I ran into a problem that truly surprised me yesterday:
One of our remote desktop users accidentally deleted an entire folder of customer data. He was seriously angry, and rightfully so: He had pressed the Delete key on the keyboard at the wrong moment, after which the folder and its entire substructure was removed from disk without any need for confirmation.
As it turns out, this is default behavior in Windows, from Windows 8 on the client side and Windows Server 2012 on the server side. On local volumes on a regular PC, what usually happens is that the chosen documents are moved to the Recycle Bin without warning, which isn’t all too bad unless it happens to the kind of user who doesn’t always know what they’re doing.
The problem is that the same thing happens on network drives, no matter if they have Recycle Bin or Shadow Copies activated or not.

We do have working backups, so this only resulted in a few minutes of frustration for the user, but I can’t believe that this new default behavior passed testing at Microsoft, and I have no idea how they motivated the change.

The old and sane behavior with a nice and friendly dialog telling you you’re about to lose a month’s worth of work, and do you really want that, can be restored via group policy:
User Configuration -> Administrative Templates -> Windows Components -> Windows Explorer / File Explorer -> Display confirmation dialog when deleting files.

Exchange – another lesson learned

This is why we test things before going live:
After migrating a test box from the old Exchange environment, it could receive mail just fine, and sending mail out of the organization worked flawlessly too. Unfortunately any mail sent from this account to recipients within the old Exchange environment got stuck in the mail queue.

Logically as usual, the fix was to complement the default receive connectors on the old servers with the explicit addresses of the new Exchange servers, even though they naturally were well within the 0.0.0.0-255.255.255.255 range. Way to go, Microsoft!

Load Balancing Exchange 2016 behind HAProxy

I recently started the upgrade to Exchange 2016 at work. A huge benefit over Exchange 2010, is that REST based client connections are truly stateless. In effect this means that if a server goes down, clients shouldn’t really notice any issues as long as something redirects them to a working server. In my system, this something is HAProxy.

The guys at HAProxy have their own excellent walkthroughs for setting up their load balancer for Exchange 2013, which can pretty much be lifted verbatim to Exchange 2016, but I want to add a few key points to think about:

Service health checks

Each web service has a virtual file to tell its state, called HealthCheck.htm. Let HAProxy use the contents of this file for the server health check. That way it’ll know to redirect clients if one of the services is down, even though the Exchange server in question may still be listening on port 443.

Example config:

This example shows a test of the Outlook Web Access service state. Naturally the config can be set to test each of the REST services each Exchange server presents.

Exchange server default firewall rules

Our design puts the load balancer in a DMZ outside of our server networks. Clients connecting through the load balancer will be dropped by Windows firewall rules generated by Exchange; specifically the edge traversal rules for the POP3 and IMAP protocols. Make sure you allow edge traversal for these protocols, letting the network firewall take care of limiting external client connections to them. Also take note there are multiple firewall rules for IMAP and POP3 traffic. Only the ones concerned with client traffic are relevant for this change. There’s no point in kicking open holes in your firewall for no good reason.

Exchange and Outlook suck at IMAP

We use IMAP for an internal order management system. Outlook and Exchange aren’t the best tools for this protocol, but unfortunately we have to live with those due to sins committed long ago. I spent quite some time troubleshooting our IMAP connections:
No matter how I configured Outlook I couldn’t get it to open an IMAP connection to the Exchange servers. Error messages varied depending on the client settings, but essentially I couldn’t log on, couldn’t establish a secure connection, or couldn’t synchronize my folders.

I would get the regular banner when telnetting from the client machine, so I knew traffic was getting through all the way from Exchange via the load balancer.
Mozilla Thunderbird could connect perfectly well and sync accounts, both using STARTTLS on port 143 and over a TLS encrypted connection on port 993. After mulling it over, I turned on debug logging in Outlook and quickly saw that the client was trying and failing to perform an NTLM logon to Exchange. Using the error messages as search terms, I found others who had experienced the same issue. Their solution had been to turn off NTLM authentication for the IMAP protocol on the Exchange server. This seems to be a regression in Exchange Server 2016 from an earlier bug in Exchange 2013.
The command in the Exchange Management Shell:

After this, Outlook still is incapable of logging on using TLS over port 993, but at least it consistently manages to run STARTTLS over port 143, which is good enough for my use case.

All in all, the most complicated part here wasn’t to make HAProxy do its magic, but to get Exchange and Outlook do what they should.

The paravirtual SCSI controller and the blue screen of death

For driver reasons, the default disk controller in VMware guests is an emulated LSI card. However, once you install VMware Tools in Windows (and immediately after installing the OS in most modern Linux distributions), it’s possible to slightly lower the overhead for disk operations by switching to the paravirtual SCSI controller (“pvscsi”).

I’m all for lower overhead, so my server templates are already converted to use the more efficient controller, but I still have quite a lot of older Windows servers that still run the LSI controller, so I’ve made it a habit to switch controllers when I have them down for manual maintenance. There is a perfectly good way of switching Windows system drives to a pvscsi controller in VMware, and it’s well documented, so up until a couple of days ago, I’ve never encountered any issues.

Continue reading “The paravirtual SCSI controller and the blue screen of death”