Restoring a really old domain controller from backups

I had an interesting experience this week, where I was faced with the need to restore an entire Active Directory environment from backups that were more than a year old.

The company whose servers I was restoring had been using an older version of Veeam Backup and Recovery, which always simplifies matters a lot: The entire thing was delivered to me over sneaker net, on a 2.5″ USB drive containing several restore points for each machine.

The restore was uneventful, as expected, and most machines simply started up in their new home. Unfortunately, one of the Active Directory controllers would bluescreen on boot, with a C00002E2 error message.

After some reading up on things, I realized the machine had passed the Active Directory tombstone period: as I wrote, the backups were taken over a year ago. Since I had one good domain controller, I figured I would simply cheat with the local time on the failing DC. It would boot successfully into Directory Services Recovery Mode, so I could set the local clock, but anybody who has a bit of experience with the VMware line of virtualization products knows that by default, VMware ESXi synchronizes the guest system clock in a few situations; amongst them on reboot.

Fortunately VMware has a knowledgebase article covering how to disable all synchronization of time between guests and hosts. A total of eight advanced settings must be set to False, with the guest turned off:

tools.syncTime
time.synchronize.continue
time.synchronize.restore
time.synchronize.resume.disk
time.synchronize.shrink
time.synchronize.tools.startup
time.synchronize.tools.enable
time.synchronize.resume.host

The procedure is documented in KB1189.

After setting these properties on the machine, I started it back up, with the system time set well back into the range before the tombstone cutoff date, let it start up and rest for a while for all services to realize everything was alright, and then I set the time forward to the current date, waited a bit longer, and restarted the VM. After this, the system started working as intended.

Rescuing vVol-based virtual machines

Background

As mentioned in a previous post, I had a really bad experience with vVols presented from IBM storage. Anyhow, the machines must be migrated to other storage, and reading how vVols work, that’s a scary prospect.

The good thing: Thanks to Veeam, I have excellent backups.

The bad thing: Since they’re dependent on the system’s ability to make snapshots, I only have backups up until my vVols failed. Troubleshooting, identifying the underlying issue, having VMware look at the systems and point at IBM, and finally realizing IBM won’t touch my issue unless I sign a year’s worth of software support agreements took several days, during which I’ve had no new backups for the affected VMs.

Fortunately, most of the systems I had hosted on the failed storage volumes were either more or less static, or stored data on machines on regular LUNs or vSAN.

The Three Trials Methods

Veeam restore

Templates and turned off-machines were marked as Inaccessible in the vCenter console. Since they had definitely seen no changes since the vVol storage broke down, I simply restored them to other datastores from the latest available backup.

VMware Converter

I attempted to use a Standalone VMware Converter to migrate a Ubuntu VM, but for some reason it kept having kernel panics on boot time. I suspect it may have something to do with the fact that Converter demands that the paravirtual SCSI controller is replaced with the emulated LSI one. I have yet to try with a Windows server, but my initial tests made me decide to only use Converter as an extra backup.

Cold migration

This is one method I was surprised worked, and which simplified things a lot. It turns out that – at least with the specific malfunction I experienced – turning off a VM that has been running doesn’t actually make it inaccessible to vCenter. And since a turned off VM doesn’t require the creation of snapshots to allow migration, moving it to accessible storage was a breeze. This is what I ended up doing with most of the machines.

Summary

It turns out that at least for my purposes, the vVols system decided to ”fail safe”, relatively speaking, allowing for cold migration of all machines that had been running when the management layer failed. I had a bit of a scare when the cold migration of a huge server failed due to a corrupt snapshot, but a subsequent retry where I moved the machine to a faster datastore succeeded, meaning I did not have to worry about restoring data from other copies of the machine.

It’s so fluffy!

(Or: Backblaze B2 cloud backups from a Proxmox Virtual Environment)

Backups are one of those things that have a tendency to become unexpectedly expensive – at least through the eyes of a non-techie: Not only do you need enough space to store several generations of data, but you want at least twice that, since you want to protect your information not only from accidental deletion or corruption, but also from the kind of accidents that can render both the production data and the backup unreadable. Ultimately, you’ll also want to spend the resources to automate as much of the process as possible, because anything that requires manual work will be forgotten at some point, and by some perverse law of the Universe, that’s when it would have been needed.

In this post I’ll describe how I’ve solved it for full VM/container backups in my lab/home environment. It’s trivial to adapt the information from this post to apply to regular file system backups. Since I’m using a cloud service to store my backups, I’m applying a zero trust policy to them at the cost of increased storage (and network) requirements, but my primary dataset is small enough that this doesn’t really worry me.

Backblaze currently offers 10 GB of B2 object storage for free. This doesn’t sound like a lot today, but it will comfortably fit several compressed and encrypted copies of my reverse proxy, and my mail and web servers. That’s Linux containers for you.

First of all, we’ll need an account at Backblaze. Save your Master Application Key in your password manager! We’ll need it soon. Then we’ll want to create a Storage Bucket. In my case I gave it the wonderfully inventive name “pvebackup”.

Next, we shall install a program called rclone on our Proxmox server. The version in the apt repository as I write this seems to have a bug vis à vi B2, that will require us to use the Master Application Key rather than a more limited Application Key specifically for this bucket. Since we’re encrypting our cloud data anyway, I feel pretty OK with this compromise for home use.

EDIT 2018-10-30: Downloading the current dpk package of rclone directly from the project site did solve this bug. In other words it’s possible and preferable to create a separate Application Key with access only to the backup bucket, at least if the B2 account will be used for other storage too.

# apt install rclone

Now we’ll configure the program:

# rclone config --config /etc/rclone.conf
Config file "/etc/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config

Type n to create a new remote configuration. Name it b2, and select the appropriate number for Backblaze B2 storage from the list: In my case it was number 3.

The Account ID can be viewed in the Backblaze portal, and the Application Key is the master key we saved in our password manager earlier. Leave the endpoint blank and save your settings. Then we’ll just secure the file:

# chown root. /etc/rclone.conf && chmod 600 /etc/rclone.conf

We’ll want to encrypt the file before sending it to an online location. For this we’ll use gpg, for which the default settings should be enough. The command to generate a key is gpg –gen-key, and I created a key in the name of “proxmox” with the mail address I’m using for notification mails from my PVE instance. Don’t forget to store the passphrase in your password manager, or your backups will be utterly worthless.

Next, we’ll shamelessly steal and modify a script to be used for hooking into the Proxmox VE backup process (I took it from this github repository and repurposed it for my needs).

Edit 2018-10-30: I added the –b2-hard-delete option to the job-end phase of deleting old backups, since the regular delete command just hides files in the B2 storage, adding to the cumulative storage used.

#!/usr/bin/perl -w
# VZdump hook script for offsite backups to Backblaze B2 storage
use strict;

print "HOOK: " . join (' ', @ARGV) . "\n";

my $phase = shift;

if ($phase eq 'job-start' ||
        $phase eq 'job-end'  ||
        $phase eq 'job-abort') {

        my $dumpdir = $ENV{DUMPDIR};

        my $storeid = $ENV{STOREID};

        print "HOOK-ENV: dumpdir=$dumpdir;storeid=$storeid\n";

        if ($phase eq 'job-end') {
                        # Delete backups older than 8 days
                        system ("/usr/bin/rclone delete -vv --b2-hard-delete --config /etc/rclone.conf --min-age 8d b2:pvebackup") == 0 ||
                                die "Deleting old backups failed";
        }
} elsif ($phase eq 'backup-start' ||
        $phase eq 'backup-end' ||
        $phase eq 'backup-abort' ||
        $phase eq 'log-end' ||
        $phase eq 'pre-stop' ||
        $phase eq 'pre-restart' ||
        $phase eq 'post-restart') {
        my $mode = shift; # stop/suspend/snapshot
        my $vmid = shift;
        my $vmtype = $ENV{VMTYPE}; # lxc/qemu
        my $dumpdir = $ENV{DUMPDIR};
        my $storeid = $ENV{STOREID};
        my $hostname = $ENV{HOSTNAME};
        # tarfile is only available in phase 'backup-end'
        my $tarfile = $ENV{TARFILE};
        my $gpgfile = $tarfile . ".gpg";
        # logfile is only available in phase 'log-end'
        my $logfile = $ENV{LOGFILE};
        print "HOOK-ENV: vmtype=$vmtype;dumpdir=$dumpdir;storeid=$storeid;hostname=$hostname;tarfile=$tarfile;logfile=$logfile\n";
        # Encrypt backup and send it to B2 storage
        if ($phase eq 'backup-end') {
                system ("/usr/bin/gpg -e -r proxmox $tarfile") == 0 ||
                        die "Encrypting tar file failed";
                system ("/usr/bin/rclone copy -v --config /etc/rclone.conf $gpgfile b2:pvebackup") == 0 ||
                        die "Copying encrypted file to B2 storage failed";
        }
        # Copy backup log to B2
        if ($phase eq 'log-end') {
                system ("/usr/bin/rclone copy -v --config /etc/rclone.conf $logfile b2:pvebackup") == 0 ||
                        die "Copying log file to B2 storage failed";
        }
} else {
      die "got unknown phase '$phase'";
}
exit (0);

Store this script in /usr/local/bin/vzclouddump.pl and make it executable:

# chown root. /usr/local/bin/vzclouddump.pl && chmod 755 /usr/local/bin/vzclouddump.pl

The last cli magic for today will be to ensure that Proxmox VE actually makes use of our fancy script:

# echo "script: /usr/local/bin/vzclouddump.pl" >> /etc/vzdump.conf

To try it out, select a VM or container in the PVE web interface, select Backup -> Backup now. I use Snapshot as my backup method and GZIP as my compression method. Hopefully you’ll see no errors in the log, and the B2 console will display a new file with a name corresponding to the current timestamp and the machine ID.

Conclusion

The tradeoffs with this solution compared to, for example, an enterprise product from Veeam are obvious, but so is the difference in cost. For a small business or a home lab, this solution should cover the needs to keep the most important data recoverable even if something bad happens to the server location.