Restoring a really old domain controller from backups

I had an interesting experience this week, where I was faced with the need to restore an entire Active Directory environment from backups that were more than a year old.

The company whose servers I was restoring had been using an older version of Veeam Backup and Recovery, which always simplifies matters a lot: The entire thing was delivered to me over sneaker net, on a 2.5″ USB drive containing several restore points for each machine.

The restore was uneventful, as expected, and most machines simply started up in their new home. Unfortunately, one of the Active Directory controllers would bluescreen on boot, with a C00002E2 error message.

After some reading up on things, I realized the machine had passed the Active Directory tombstone period: as I wrote, the backups were taken over a year ago. Since I had one good domain controller, I figured I would simply cheat with the local time on the failing DC. It would boot successfully into Directory Services Recovery Mode, so I could set the local clock, but anybody who has a bit of experience with the VMware line of virtualization products knows that by default, VMware ESXi synchronizes the guest system clock in a few situations; amongst them on reboot.

Fortunately VMware has a knowledgebase article covering how to disable all synchronization of time between guests and hosts. A total of eight advanced settings must be set to False, with the guest turned off:

tools.syncTime
time.synchronize.continue
time.synchronize.restore
time.synchronize.resume.disk
time.synchronize.shrink
time.synchronize.tools.startup
time.synchronize.tools.enable
time.synchronize.resume.host

The procedure is documented in KB1189.

After setting these properties on the machine, I started it back up, with the system time set well back into the range before the tombstone cutoff date, let it start up and rest for a while for all services to realize everything was alright, and then I set the time forward to the current date, waited a bit longer, and restarted the VM. After this, the system started working as intended.

Rescuing vVol-based virtual machines

Background

As mentioned in a previous post, I had a really bad experience with vVols presented from IBM storage. Anyhow, the machines must be migrated to other storage, and reading how vVols work, that’s a scary prospect.

The good thing: Thanks to Veeam, I have excellent backups.

The bad thing: Since they’re dependent on the system’s ability to make snapshots, I only have backups up until my vVols failed. Troubleshooting, identifying the underlying issue, having VMware look at the systems and point at IBM, and finally realizing IBM won’t touch my issue unless I sign a year’s worth of software support agreements took several days, during which I’ve had no new backups for the affected VMs.

Fortunately, most of the systems I had hosted on the failed storage volumes were either more or less static, or stored data on machines on regular LUNs or vSAN.

The Three Trials Methods

Veeam restore

Templates and turned off-machines were marked as Inaccessible in the vCenter console. Since they had definitely seen no changes since the vVol storage broke down, I simply restored them to other datastores from the latest available backup.

VMware Converter

I attempted to use a Standalone VMware Converter to migrate a Ubuntu VM, but for some reason it kept having kernel panics on boot time. I suspect it may have something to do with the fact that Converter demands that the paravirtual SCSI controller is replaced with the emulated LSI one. I have yet to try with a Windows server, but my initial tests made me decide to only use Converter as an extra backup.

Cold migration

This is one method I was surprised worked, and which simplified things a lot. It turns out that – at least with the specific malfunction I experienced – turning off a VM that has been running doesn’t actually make it inaccessible to vCenter. And since a turned off VM doesn’t require the creation of snapshots to allow migration, moving it to accessible storage was a breeze. This is what I ended up doing with most of the machines.

Summary

It turns out that at least for my purposes, the vVols system decided to ”fail safe”, relatively speaking, allowing for cold migration of all machines that had been running when the management layer failed. I had a bit of a scare when the cold migration of a huge server failed due to a corrupt snapshot, but a subsequent retry where I moved the machine to a faster datastore succeeded, meaning I did not have to worry about restoring data from other copies of the machine.