Simple DMARC report parsing and visualizing toolkit

Just a short post to recommend techsneeze‘s tools for downloading, parsing, and displaying DMARC reports. I’m not exactly a Perl expert, so it took me a few minutes to install the necessary modules to get the scripts working, but after that I am a happy camper.

On that note: “I was today years old when I realized the usefulness of apt-file in Debian-based distros.”

The web reporting tool should not be presented outside of a secured network, but at first glance it seems to do exactly what it sets out to do, in visualizing SPF and DKIM failures.

File system rights on mounted drives in Windows

As I repeatedly state, the same object oriented design that makes PowerShell potentially powerful in complex tasks, also makes it require ridiculous verbosity on our part to make it accomplish simple ones. Today’s post is a perfect example.

Consider a volume mounted to an NTFS mountpoint in a directory. Since this is an obvious afterthought in the file system design, setting access rights on the mountpoint directory won’t do you any good if you expect these rights to propagate down through the mounted file system. While the reason may be obvious once you think about the limitations in the design, it certainly breaks the principle of least astonishment. The correct way to set permissions on such a volume is to configure the proper ACL on the partition object itself.

In the legacy Computer Management MMC-based interface, this was simply a matter of right-clicking in the Disk Management module to change the drive properties, and then setting the correct values in the Security tab. In PowerShell, however, this isn’t a simple command, but a script with three main components:

  • Populate an ACL object with the partition object’s current security settings
  • Modify the properties of the ACL object
  • Commit the contents of the ACL object back into the partition object

Here’s how it’s done:

First we need to find the volume identifier. For this we can use get-partition | fl, optionally modified with a where, or ?, query, if we know additional details that can help narrow the search. What we’re looking for is something looking like the following example in our DiskPath property:

\\?\Volume{f0e7b028-8f53-42fa-952b-dc3e01c161d8}

Armed with that we can now fill an object with the ACL for our volume:

$acl = [io.directory]::GetAccessControl("\\?\Volume{f0e7b028-8f53-42fa-952b-dc3e01c161d8}\")

We then create a new access control entry (ACE):

$newace = New-Object -TypeName System.Security.AccessControl.FileSystemAccessRule -ArgumentList "DOMAIN\testuser", "ReadAndExecute, Traverse",
 "ContainerInherit, ObjectInherit", "None", "Allow"

The reason we must enter data in this order is because of the definition of the constructor for the access control entry object. There’s really no way of understanding this from within the interactive scripting environment; you just have to have a bunch of patience and read dry documentation, or learn from code snippets found through searching the web.

The next step is to load our new ACE into the ACL object:

$acl.SetAccessRule($newace)

What if we want to remove rights – for example the usually present Everyone entry? In that case we need to find every ACE referencing that user or group in our ACL, and remove it:

$acl.access | ?{$_.IdentityReference.Value -eq "Everyone"} | ForEach-Object { $acl.RemoveAccessRule($_)}

If we’ve done this job interactively, we can take a final look at our ACL to confirm it still looks sane by running $acl | fl.

Finally we’ll commit the ACL into the file system again:

[io.directory]::SetAccessControl("\\?\Volume{f0e7b028-8f53-42fa-952b-dc3e01c161d8}\",$acl)

And there we go: We’ve basically had to write an entire little program to make it, and the poor inventors of the KISS principle and of the principle of least astonishment are slowly rotating like rotisserie chickens in their graves, but we’ve managed to set permissions on a mounted NTFS volume through PowerShell.

NTFS mount points via PowerShell

As I mentioned in an earlier post, it’s sometimes useful to mount an additional drive in a directory on an existing drive, Unix-style, rather than presenting it with its own traditional Windows-style drive letter.

Here’s how we do it in PowerShell:

If the volume is already mounted to a drive letter, we need to find the disk number and partition number of the letter:

Get-Partition | select DriveLetter, DiskNumber, PartitionNumber | ft

DriveLetter DiskNumber PartitionNumber
----------- ---------- ---------------
                     0               1
          C          0               2
                     1               1
          E          1               2
                     2               1
          F          2               2
                     3               1
          G          3               2

In this example, we see that volume G corresponds to DiskNumber 3, PartitionNumber 2.

Let’s say we want to mount that disk under E:\SharedFiles\Mountpoint. First we need to make sure the directory exists. Then we’ll run the following commands:

Add-PartitionAccessPath -DiskNumber 3 -PartitionNumber 2 -AccessPath 'E:\SharedFiles\Mountpoint\'
Remove-PartitionAccessPath -DiskNumber 3 -PartitionNumber 2 -AccessPath 'G:\'

Summary

As usual, PowerShell is kind of “wordy”, but we do get our things done.

Creating a working Ubuntu 18.04 VMware template

Long story short: I use VMware and I use Ubuntu. With Ubuntu 16.04 everything worked nicely out of the box. With Ubuntu 18.04 it doesn’t. I finally got tired of manually setting my hostname and network settings every time I need a new server, and decided to fix my template once and for all.

Networking

The first thing that doesn’t work – as mentioned in an earlier post – is deploy-time configuration of the network based on vCenter templates.

For some weird reason, Ubuntu has chosen to entirely replace the old ifupdown system for configuring the network with a combination of Cloud-init and Netplan. If we choose to download the installation image with the traditional installer, at least we don’t get cloud-init, but Netplan remains.

False start

According to the Netplan FAQ, we can install Ubuntu Server without using Netplan by pressing F6 followed by ‘e’ in the installer boot menu, and adding netcfg/do_not_use_netplan=true to the preseed command line.

Unfortunately this leaves us with a disconnected machine after first boot: It turns out Ubuntu isn’t smart enough to actually install ifupdown if netplan is deselected – at least not using the current installer, 18.04.01.

The working way

The solution to the problem above is still (in February 2019) to perform a clean install with Netplan, and then manually remove open-vm-tools and replace it with VMware’s official tools, since open-vm-tools do not yet support Ubuntu’s weirdness even 10 months after 18.04 was released.

…However…

The default DHCP behavior in Ubuntu 18.04 is nothing other than idiotic for use in VMware templates: Despite newly deployed machines naturally getting new MAC addresses, they insist on asking to be handed the same IP address as their template, and they naturally don’t understand if the lease is already taken but will keep stealing the IP address from each other.

Fortunately, according to this post over at superuser.com, there’s a way to fix this. Edit /etc/netplan/01-netcfg.yaml, and tell Netplan to use the MAC address as the DHCP identifier, like this:

      dhcp4: yes
      dhcp-identifier: mac

After this, new machines deployed from the template should behave slightly more sanely.

Painfully long Grub menu timeout

Grub’s boot menu has a default timeout of 30 seconds in Ubuntu 18.04. The relevant setting is apparently modifiable in /etc/default/grub. Only it isn’t. The default value for GRUB_TIMEOUT is 2 seconds, which it doesn’t adhere to at all. Logically (no, not at all), the “fix” is to add the following line to /etc/default/grub:

GRUB_RECORDFAIL_TIMEOUT=2

Re-run update-grub with superuser rights, and reboot the computer to confirm it worked as intended.

End result

With the changes detailed above, and after installing Python to allow Ansible to perform its magic on VMs deployed from this template, I finally have reached feature parity with my Ubuntu 16.04 template.

Rescuing vVol-based virtual machines

Background

As mentioned in a previous post, I had a really bad experience with vVols presented from IBM storage. Anyhow, the machines must be migrated to other storage, and reading how vVols work, that’s a scary prospect.

The good thing: Thanks to Veeam, I have excellent backups.

The bad thing: Since they’re dependent on the system’s ability to make snapshots, I only have backups up until my vVols failed. Troubleshooting, identifying the underlying issue, having VMware look at the systems and point at IBM, and finally realizing IBM won’t touch my issue unless I sign a year’s worth of software support agreements took several days, during which I’ve had no new backups for the affected VMs.

Fortunately, most of the systems I had hosted on the failed storage volumes were either more or less static, or stored data on machines on regular LUNs or vSAN.

The Three Trials Methods

Veeam restore

Templates and turned off-machines were marked as Inaccessible in the vCenter console. Since they had definitely seen no changes since the vVol storage broke down, I simply restored them to other datastores from the latest available backup.

VMware Converter

I attempted to use a Standalone VMware Converter to migrate a Ubuntu VM, but for some reason it kept having kernel panics on boot time. I suspect it may have something to do with the fact that Converter demands that the paravirtual SCSI controller is replaced with the emulated LSI one. I have yet to try with a Windows server, but my initial tests made me decide to only use Converter as an extra backup.

Cold migration

This is one method I was surprised worked, and which simplified things a lot. It turns out that – at least with the specific malfunction I experienced – turning off a VM that has been running doesn’t actually make it inaccessible to vCenter. And since a turned off VM doesn’t require the creation of snapshots to allow migration, moving it to accessible storage was a breeze. This is what I ended up doing with most of the machines.

Summary

It turns out that at least for my purposes, the vVols system decided to ”fail safe”, relatively speaking, allowing for cold migration of all machines that had been running when the management layer failed. I had a bit of a scare when the cold migration of a huge server failed due to a corrupt snapshot, but a subsequent retry where I moved the machine to a faster datastore succeeded, meaning I did not have to worry about restoring data from other copies of the machine.