Fixing “No DKIM keys saved for this domain” in EOP and Office365

Sometimes a newly added domain in Microsoft EOP will not let you enable DKIM from the web user interface. The only workaround I know of is to prepare the domain using PowerShell.

To connect a PS session to O365, I use the following script, ripped straight from Microsoft’s documentation:

$UserCredential = Get-Credential
$Session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://outlook.office365.com/powershell-liveid/ -Credential $UserCredential -Authentication Basic -AllowRedirection
Import-PSSession $Session -DisableNameChecking

After waiting for an eternity for the necessary stuff to load, run the following command – and wait another eternity for it to finish:

New-DkimSigningConfig -DomainName "mydomain.tld" -Enabled $true

Note: Unless you’ve already added the necessary _domainkey CNAME records to your DNS zonefile, this command will succeed in generating the DKIM keys, but will fail to enable DKIM signing for the domain. Without looking into it I suspect that the Set-DkimSigningConfig cmdlet could be used to enable signing.

Finally disconnect from your O365 PS session:

Remove-PSSession $Session

Your domain now signs mail sent through O365 or via Exchange Online Protection.

Bonus knowledge: With a recent version of PowerShell Core installed, you can manage situations like this from a regular Mac or Linux box.

SIP telephony behind a pfSense firewall

Background:
When we got the fibre connection, I decided to use Bahnhof as our service provider. They enable a SIP phone connection at no extra cost, but they don’t support using third-party SIP boxes; you have to use their combined router/wifi AP/SIP converter (a box by Tilgin), which they manage for you.
Naturally, since I’m tinkering a bit, using a third-party router I can’t manage in front of my network would be unacceptable. The next best thing, then, is to put the Tilgin router behind the pfSense box and use it only for SIP.

Setup:
Bahnhof demands opening the following ports for SIP telephony to work:
69 – UDP
5060 – 5080 TCP + UDP
9000 – 14000 UDP
50000 – 60000 UDP

I set up a DHCP reservation for the Tilgin box, gave it an alias in pfSense, and NATed the ports specified above to it.
Second, I connected the WAN port of the Tilgin box to my network, and saw that it started up fine, and I could both call out and receive calls using a phone connected to the router. All fine, right?
Not quite. After a few hours, incoming calls stopped working. A couple of minutes with my search engine provided the following page: https://www.netgate.com/docs/pfsense/nat/configuring-nat-for-voip-phones.html.
The required fix was the first one suggested; to enable hybrid outbound NAT and static ports for UDP traffic from the Tilgin box.

IPv6 guests in KVM

I’ve been experimenting with IPv6 at home, and spent some time trying to get it working in my virtual machines.

The first symptom I got was that VMs got a “Network unreachable” error when trying to ping6 anything but their own address. The cause was a complete brainfart on my side: We need a loopback interface network definition for IPv6 in /etc/network/interfaces:

auto lo
iface lo inet loopback
iface lo inet6 loopback

The second problem took a bit more digging to understand: I would get an IPv6 address, and I could ping stuff both on my own network and on the Internet from the VM, but no other computers could reach the virtual machine over IPv6.

According to this discussion, QEMU/KVM has support for multicast (required for proper IPv6 functioning), but it’s turned off by default. Remedy this by running virsh edit [vm-name] and adding trustGuestRxFilters='yes' to the appropriate network interface definition:

    
      
      
      
      

As usual, when you understand the problem the solution is simple.

Frustrations in Ubuntu 18.04

My first frustration with Ubuntu 18.04 came yesterday. I created a template VM with my basic toolkit that any machine in my network should have. I then deployed the VM and asked vSphere to set the hostname to the value of the VM name. Strangely, this didn’t happen: The new machine booted up alright, but its name remained that of the template.

Remember the old way to manually change the name of a machine in Linux? It went something like this:

  1. Add the new hostname to your /etc/hosts so sudo doesn’t get confused.
  2. Replace the old hostname in /etc/hostname with the new one.
  3. Reboot the computer or restart all affected services.

The new way goes like this:

  1. Add the new hostname to your /etc/hosts so sudo doesn’t get confused.
  2. Replace the old hostname in /etc/hostname with the new one.
  3. Reboot the computer.
  4. Notice that the hostname is the same as it was before you attempted to change it.
  5. Web search “change hostname ubuntu 18.04”.
  6. Discover that there’s a new utility, hostnamectl, which has a command, change-hostname, that takes the new hostname as an argument.
  7. Run hostnamectl change-hostname [newname]
  8. Run hostnamectl without any arguments to confirm that “Static hostname” has the correct value.
  9. Log off and back on again and be happy that everything seems to be working.
  10. Reboot the computer after doing some changes.
  11. Notice that the hostname is back to what it was.
  12. Run hostnamectl change-hostname [newname] again, and check in /etc/hostname just to see that it actually did change the file to contain the new hostname.
  13. Check in /etc/hosts and see that the new name appears there too.
  14. Scour the web some more for additional information.
  15. Find some mention of cloud-init.
  16. Read up on it and see the point of it – but also that it doesn’t apply to my current environment.
  17. Run sudo apt remove cloud-init
  18. Reboot the server and see that it works as expected again.
  19. (In the future: Learn more about cloud-init and re-evaluate whether it should be implemented in my environment as a complement to Ansible).

DNS/DHCP issues in modern Windows versions

Static IP addresses are a solid way to configure machines if you have few enough of them to manage them manually. But the more ability you want to have to change things on the fly, the more limiting such a configuration scheme becomes.

Unfortunately I’ve had severe problems with getting servers with DHCP leases (or even DHCP reservations) to have their names stick in DNS over time. Suddenly, after a reboot a machine would seemingly drop off the network even though it had the same IP address as earlier. Rebooting or manually re-registering its DNS record would solve the problem, but it wasn’t an acceptable solution to the underlying issue.

I found a discussion that gave a few pointers on how to get these things working in Windows, and I’ve shamelessly ripped the relevant points to present them here:

Step one: Define a user in whose context the DHCP server will run

Simply add a domain user with no special rights, and give it a properly strong password. Then open the DHCP management console, right-click the protocol you want to change (IPv4 or IPv6), and select Properties and the Advanced tab. Click Credentials and enter the relevant information for the account.

Step two: Tell DHCP to always attempt to update DNS records

In the same properties window, select the DNS tab. Ensure the following choices are ticked:

Enable DNS Dynamic Updates(…) -> Always dynamically update DNS records
Dynamically update DNS records for DHCP clients that do not request updates

Step three: Ensure DHCP server AD group membership

The DHCP server(s) should exist in the group DNSUpdateProxy. No other user or computer accounts may exist in this group.

Other tips

Make sure DHCP leases are longer than 24 hours, or bad things are likely to happen. A concrete example given is that Microsoft KMS servers have a 24 hour update cycle.

 

Transport security with Postfix

I had a “Face: Meet Palm” moment today, and as usual when that happens, I learned something new:

What happened was that I noticed that mail from a Postfix server I use for sending mail from a couple of domains was marked with the red “no encryption” label rather than the expected grey “standard encryption” icon when I looked at the message details in Gmail. I was sure that I had set the server to use what they call “opportunistic TLS”; that is: Attempt to use TLS but fall back to no encryption if that’s unavailable.

Reading the Postfix documentation, however, I saw the problem: there are two sets of TLS rules in the main.cf configuration file: those starting with “smtpd_“, which deal with how the server responds to its clients, and those who start with “smtp_“, which deal with how Postfix acts when working in client mode towards other servers.

So now I have the following two lines in my /etc/postfix/main.cf:

smtp_tls_security_level = may
smtpd_tls_security_level = may

Resizing the system volume on a Linux VM

Background

With LVM, the preferred way of adding storage space to a computer running a Linux-based operating system seems to be to add disks, judging by my search results. Naturally, this is a great way of minimizing disruption in a physical machine, but what if you’re running your machines virtually? Adding virtual disks tends to get messy after a while, and hypervisors allow you to simply grow the vdisk, so why not do that?

Problem is, the old way I used to do it (using partprobe after growing the partition) required a system reboot to see the entire available new space if I attempted it on the system volume. Documented below is a better way.

The process

Start by confirming the current disk size so we know our baseline.

# fdisk -l

Disk /dev/sda: 26.8 GB, 26843545600 bytes, 52428800 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000ba3e8

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     2099199     1048576   83  Linux
/dev/sda2         2099200    52428799    25164800   8e  Linux LVM

Disk /dev/mapper/ol-root: 18.2 GB, 18249416704 bytes, 35643392 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

OK, so we have slightly less than 27 GB of disk space. Let’s grow the disk image in the hypervisor, and then re-scan the device.

# ls /sys/class/scsi_device/
1:0:0:0 2:0:0:0
# echo 1 > /sys/class/scsi_device/1\:0\:0\:0/device/rescan
# fdisk -l

Disk /dev/sda: 80.5 GB, 80530636800 bytes, 157286400 sectors
(...)

Now we have the disk space available, let’s perform the steps to grow our file system.

# fdisk /dev/sda


Command (m for help): p

Disk /dev/sda: 80.5 GB, 80530636800 bytes, 157286400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000ba3e8

Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2099199 1048576 83 Linux
/dev/sda2 2099200 52428799 25164800 8e Linux LVM

Command (m for help): d
Partition number (1,2, default 2): 
Partition 2 is deleted

Command (m for help): n
Partition type:
 p primary (1 primary, 0 extended, 3 free)
 e extended
Select (default p): 
Using default response p
Partition number (2-4, default 2): 
First sector (2099200-157286399, default 2099200): 
Using default value 2099200
Last sector, +sectors or +size{K,M,G} (2099200-157286399, default 157286399): 
Using default value 157286399
Partition 2 of type Linux and of size 74 GiB is set

Command (m for help): t
Partition number (1,2, default 2): 
Hex code (type L to list all codes): 8e
Changed type of partition 'Linux' to 'Linux LVM'

Command (m for help): w
The partition table has been altered!

The above statement is followed by what used to be a problem:

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.

Partprobe won’t help us here, and kpartx for some reason doesn’t consistently catch the entire new disk size. The correct way, then, is the following:

# partx -u /dev/sda2

The result?

# partx -s /dev/sda
NR START END SECTORS SIZE NAME UUID
 1 2048 2099199 2097152 1G 
 2 2099200 157286399 155187200 74G

Now let’s finish extending everything up to the actual file system:

# pvresize /dev/sda2
 Physical volume "/dev/sda2" changed
 1 physical volume(s) resized / 0 physical volume(s) not resized
# lvextend -l 100%VG /dev/mapper/ol-root
 Size of logical volume ol/root changed from <17.00 GiB (4351 extents) to <72.00 GiB (18431 extents).
 Logical volume ol/root successfully resized.
# xfs_growfs /dev/mapper/ol-root

And finally let’s check that everything worked out as we expected:

# df -h
Filesystem Size Used Avail Use% Mounted on
(...)
/dev/mapper/ol-root 72G 17G 56G 24% /
(...)

Conclusion

The Windows family of operating systems has had the ability to grow any volume on the fly since Server 2008. I couldn’t imagine that Linux would lack this ability, but I didn’t know how to do it the right way. Now I do.

When the French attack…

A consultant working with our Alcatel phone system encountered a weird issue that caused us some problems the other day. When attempting to install an Open Touch Media Server (used for receiving fax, for example), the entire vCenter client environment froze, and a reload of the page resulted in the following error message:

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x0000…] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

A lot of searching the web led me nowhere – there were a bunch of solutions, but none of whose symptoms agreed with what I was experiencing; I had not changed IP addresses on the vCenter Appliance, nor had I changed its name, and I did not have an issue with logs reporting conflicting USB device instances.

What I did have, though, was a new OpenTouch server on one of my ESXi hosts, which did not have a network assigned to its network interface, and this, apparently is not a configuration that vCenter was written to take into consideration.

Logging on to the local web client on the specific ESXi host where the machine was running (after identifying that…), and selecting the machine in question, I got a warning message specifying the network problem, and a link to the Action menu. Simply selecting a valid network and saving the machine configuration was enough to allow me to ssh to the vCenter Appliance and start the vmware-vpxd service:

# service-control –start vmware-vpxd

We’ll just have to see how we proceed from here…

FTP server on IIS

I recently had cause to set up an FTP server for internal use on a Windows server, and I bumped into an issue that took me a little while to figure out, since I had never done it before:
I wanted to give a domain user account an isolated directory, and started out by creating the user directory inside the inetpub\ftproot folder. However when I tried to log on with a client, it would I’ve me an error message “530 User cannot log in, home directory inaccessible”.

Turns out the answer was simply to create a subdirectory to the ftproot with the name of the domain, and then move the user directory inside that one.

Environment variables for services in a systemd-based system

My current config deployment automation project has required me to set up a dev/staging environment for my load balancers, since I don’t want to break stuff by deploying untested configurations.
This environment is functionally identical to a single load balancer and can be used along with a hosts file on a client to not only develop configurations and make sure they are syntactically correct, but also to fully test the functionality of the load balancer rules.

As part of this, I naturally need to change the listener addresses in the dev/staging HAProxy environment compared to the production environment that I keep version controlled in my git repository.
My first instinct was to use a script to pull the latest versions, modify the necessary lines using sed, and copy the config to the correct location. I didn’t really like this concept since it would by definition mean that the configs weren’t fully identical between the production environment+git repo and the dev/staging environment.
If I used environment variables, the version controlled configuration could be kept fully identical across all instances.

The first mistake I made took me a while to grasp. HAProxy parsed an obviously fully valid configuration, but intermittently presented a certificate I didn’t expect, and web services intermittently failed to reply to requests.
It turns out Linux daemons don’t inherit even system-wide environment variables.
So how do we check what environment variables a service does see?
First get the PID(s) of the service:

$ pgrep haproxy
1517
1521
1523
$ 

In the case of my HAProxy installation, I got a list of three processes, so I chose the last one and checked out its environment:

# cat /proc/1523/environ

This results in a list of its current environment variables, and naturally the ones I thought I’d added were nowhere to be seen.

So why did HAProxy start without complaining? Naturally since the environment variables weren’t defined in this context, their implicit value was NULL, and so HAProxy figured I wanted to listen on the assigned ports on all interfaces.

How do we assign environment variables to a service in a modern, systemd-based Linux, then?
On a command-prompt, run systemctl edit servicename. This starts your default editor. A valid config looks like this:

[Service]
Environment=ENVVAR1=value
Environment=ENVVAR2=value

On Ubuntu, this file is stored in /etc/systemd/system/servicename.service.d/override.conf, but naturally this file can be renamed to something more descriptive. The systemctl edit command doesn’t do anything magical, it’s just a shortcut.
After the file is in place, run systemctl daemon-reload to make the configuration active, and then the HAProxy service needs to be restarted, not only reloaded, for the changes to apply.

Of course, I want this config too to be deployable through Ansible.
The relevant lines from my playbook:

---
    -   name: Update environment variables for HAProxy service
        copy:
            src: "{{ config_root }}/etc/systemd/system/haproxy.service.d/10-listeners.conf"
            dest: "/etc/systemd/system/haproxy.service.d/"
        register: ha_envvar_status
    
    -   name: Reload systemd service configuration
        systemd:
            daemon_reload: yes
        when: ha_envvar_status|changed
...
    -   name: Reload HAProxy configuration
        service:
            name: haproxy
            state: reloaded
        when: ha_envvar_status|skipped and haproxy_cfg_status|changed
        register: reloaded_haproxy

    -   name: Restart HAProxy daemon
        service:
            name: haproxy
            state: restarted
        when: ha_envvar_status|changed or (haproxy_cfg_status|changed and not reloaded_haproxy|skipped)

Key lines:
The register line stores the result of the command to a variable. Thanks to that, I can use the when keyword to only reload daemons when anything has actually changed.

Summary

Linux daemons don’t automatically inherent environment variables.
In systemd-based distros (which today means pretty much anyone with corporate backing), environment variables can be added using the Environment keyword in the Service section of a file in /etc/systemd/system/servicename.service.d/.