Enabling the booking of Teams meetings in Outlook on Mac

This issue had me scratching my head for a while: With the latest version of Microsoft Office and Microsoft Teams installed on my Mac running Catalina, I couldn’t enable the booking of Teams meetings from Outlook.

The solution turned out to be to remove the regular Office programs and replace them with Office 365. The official instructions for how to do that said to log on to https://www.office.com or to https://aka.ms/office-install. Well, tough luck: There was no way to find a download link there.

Instead the correct way seems to be to download Microsoft 365 from the App Store. There was no obvious way to connect the Office suite to my work account, so I started Outlook and tried adding an account. This triggered a dialog about the possibility to activate a trial or connect to an existing subscription, with the perhaps ill-chosen options Activate and Cancel. Turns out if you press Activate you get to choose whether you actually want to activate the trial or activate Microsoft 365 with an existing account.

While the gods of good UX and the Law of Least Astonishment cry alone in a cave, I now do have a button to schedule a Teams meeting in Outlook. If I only could get the Calendar and Datadog apps installed in Teams, my life would be complete…

Oh, and speaking of great user experience: Incoming calls in Teams on the Mac do not quite steal focus – thanks for that, at least – but they hog cmd+shift+D so that attempting to send a mail from Mail.app will decline the incoming call. That’s not a great design choice, Microsoft. Now why would anybody want to use Mail.app instead of Outlook? Simple: Snappiness and good search. I can accept jumping through some hoops for things I rarely do, if my day-to-day tasks aren’t nerfed by software that feels slow and bloated.

Trusting Palo Alto GlobalProtect to use a macOS machine certificate

On a managed Mac with a machine certificate, when the certificate is renewed, Palo Alto GlobalProtect will prompt for administrative credentials before connecting. This is because the executable isn’t allowed to directly read from the System keychain.

There’s a nice explanation and fix described on Palo Alto’s site, but in case that one goes missing, here’s the workaround:


Open the Keychain Access application and locate the Machine Certificate issued to Mac OS X Client in the System keychain.
Right-click on the private key associated with Certificate and click Get Info, then go to the Access Control tab
Click ‘+’ to select an Application to allow
Press key combination + + G to open Go to Folder
Enter ‘/Applications/GlobalProtect.app/Contents/Resources’ and click Go
Find PanGPS and click it, and then press Add
Save Changes to private key

Panagent

Creating a working Ubuntu 20.04 VMware Image

A while back I was a bit frustrated at Ubuntu for their defaulting to Cloud-Init in the server edition of Ubuntu 18.04. Well I’m right there again, but now with Ubuntu 20.04.

First of all, Cloud-Init is back, and it’s not more useful to me now than it was the last time. My process is based on the tips in VMware’s KB54986:

sudo apt purge cloud-init && sudo apt autoremove
sudo rm -rf /etc/cloud
sudo sed -i -e 's&D /tmp&#D /tmp&g' /usr/lib/tmpfiles.d/tmp.conf
[Unit]
(...)
After=dbus.service

Ubuntu 20.04 also retains the idiotic habit of not presenting the computer’s MAC address as the identifier for DHCP requests, which necessitates a change to /etc/netplan/00-installer-config.yaml:

network:
  ethernets:
    ens192:
      dhcp4: true
      dhcp-identifier: mac
  version: 2

The final piece of the puzzle is to upgrade vCenter to at least version 6.7U3g, since that’s the first one that supports guest customization of Ubuntu 20.04 machines.

Another case of “Who watches the watchers”

This text was updated to reflect the current status of the story on 2020-05-27.

Not a good look for Trend Micro: Security researcher Bill Demirkapi took apart their Rootkit Buster software and described his findings in a long, technical article.

The main findings:

  1. The program installs a driver which is designed to subvert Microsoft’s quality control process.
  2. It contains security holes which a piece of malware could potentially piggy-back off of to establish control of a victim’s computer.
  3. Parts of the software are unnecessarily bloated, needlessly adding to the performance overhead many people associate with anti-malware software.

Point 3 may look trivial, but power users already complain about the performance impact of anti-malware suites in day-to-day computer usage. Getting this kind of confirmation that parts of these programs aren’t built to the highest possible standard to keep such impact as small as possible is not going to increase goodwill among those who want their computers to perform optimally and have the added security that third-party anti-malware suites promise.

But the most damning points are, of course, the first and second ones. It’s not acceptable for a security suite to contain insecure code. It’s outright disrespectful to everyone involved – Trend’s customers and Microsoft as authors of the operating system on which their product runs – to have a badly written and/or misbehaving piece of software actively try to behave better when it’s aware of being scrutinized. This is exactly the kind of behavior from which I’d expect Rootkit Buster to protect its users.

I hope we hear more about how this turns out in the future: Trend Micro has an opportunity to make something good out of this but their initial hurried reaction could have been better.

Update: Microsoft has effectively killed off the driver in question. Trend Micro still claims that they weren’t trying to circumvent Microsoft’s QA process, which resurfaces the question of how they could accidentally write code that actively checks whether it is being tested and misbehaves only if it isn’t.

Deploying VMware virtual machines using Ansible

I’ve been experimenting with deploying entire environments using Ansible. As usual I had to pass a couple of small thresholds and stumble into a couple of pitfalls before I was comfortable with solution, and so I’m documenting the process here.

I’m thinking of creating a separate post describing my general Ansible workflow in more detail for anybody who wants to know, but this post will cover how I’ve set up management of my vSphere environment from Ansible. .

Boring prerequisites

First of all, we should set up a user with the necessary rights in the vCenter. The Ansible crew has a good list of the requirements, reiterated here:

Datastore.AllocateSpace on the destination datastore or datastore folder

Network.Assign on the network to which the virtual machine will be assigned

Resource.AssignVMToPool on the destination host, cluster, or resource pool

VirtualMachine.Config.AddNewDisk on the datacenter or virtual machine folder

VirtualMachine.Config.AddRemoveDevice on the datacenter or virtual machine folder

VirtualMachine.Interact.PowerOn on the datacenter or virtual machine folder

VirtualMachine.Inventory.CreateFromExisting on the datacenter or virtual machine folder

VirtualMachine.Provisioning.Clone on the virtual machine you are cloning

VirtualMachine.Provisioning.Customize on the virtual machine or virtual machine folder if you are customizing the guest operating system

VirtualMachine.Provisioning.DeployTemplate on the template you are using

VirtualMachine.Provisioning.ReadCustSpecs on the root vCenter Server if you are customizing the guest operating system

I also added the VirtualMachine.Config.CPUCount, VirtualMachine.Config.Memory, VirtualMachine.Config.EditDevice, and VirtualMachine.Interact.DeviceConnection rights while I was at it.

These rights were added to a VMware Role. I then assigned this role to my domain user MYDOMAIN\ansible for the entire vCenter server with children.

Unfortunately this wasn’t enough to actually deploy VMs from templates: The ansible user needs to be allowed to write to VM folders or Ansible will barf with a permission-related error message. I solved this by creating the VM folder MyProject/WebServers and giving the MYDOMAIN\ansible user Administrator rights in this specific folder.

For Ansible – or rather Python – to communicate with my vCenter server, I had to ensure the necessary modules were installed. I use pip to ensure I have a recent version of Ansible stuff, and so I issued the relevant command:

pip3 install requests PyVmomi

Setting up the Ansible environment

The following two lines set up the skeleton directory structure I like to use:

mkdir -p myproject/{roles,inventories/test/{group_vars,host_vars/localhost}} && cd myproject
ansible-galaxy init roles/vm-deployment --offline

To clarify: The test subdirectory name has to do with the environment’s purpose, as in Dev, Test, Staging, Prod, rather than this being an experimental environment.

Inventories

A basic inventory file for Ansible may look like this:

---
all:
  children:
    webservers:
      hosts:
        websrvtest1:
        websrvtest2:
        websrvtestn:

The all group may contain an arbitrary number of hosts and child groups, which in turn may contain an arbitrary number of their own hosts or children. It’s also possible to put group and host variables straight into the inventory file, but I prefer to keep them separated. Note how every line ends with a colon (:). That’s on purpose and stuff breaks if they don’t.

Variables

Variables are key to reusable playbooks. Let’s set some up for this task:

vcenter: "vcenter.mydomain.tld"
vc_user: ansible
vc_pass: "{{ vault_vc_pass }}"
vc_datacenter: MyDatacenter
vc_cluster: VSANclstr
vm_template: w2019coretmpl
vm_folder: /MyProject/Test/WebServers
vm_network: vxw-dvs-161618-virtualwire-14-sid-5013-MyProject-Test
vm_datastore: vsanDatastore
vm_customization_spec: Win_Domain_member_DHCP
deploylist:
- cpmwebsrvtest1
- cpmwebsrvtest2
- cpmwebsrvtestn

Vaults

Note the "{{ vault_vc_pass }}" variable: I’m telling Ansible to look up the variable contents from some other variable. In this case it’s a hint to me that the contents are encrypted in an ansible vault. This way I don’t have to worry a lot that someone would get a hold of my private git repo: If they do I figure I have some time to change my secrets. I’m storing the vault in the same directories where I store my variable files, and a vault is intiated like this:

ansible-vault create inventories/test/host_vars/localhost/vault

I generate and store the vault passphrases in a password manager to simplify collaboration with my teams.

The vault file follows the same form as the vars one, but is encrypted on disk:

vault_vc_pass: password

Ansible tasks

The next step is to create a playbook that actually performs the magic here. In this case there’s a single step that’s looped for whatever number of machines (item) that are in my deploylist. There’s a lot more that can be customized with the vmware_guest Ansible module, but in this case my needs are simple: My vCenter customization specification does most of the job.

One thing to look for is the wait_for_customization parameter. This makes sure that Ansible doesn’t proceed to the next task until VMware has finished customizing the VM – in my case renaming the computer and joining it to a domain.

---
- name: Clone template
  vmware_guest:
    validate_certs: False
    hostname: "{{ vcenter }}"
    username: "{{ vc_user }}"
    password: "{{ vc_pass }}"
    datacenter: "{{ vc_datacenter }}"
    cluster: "{{ vc_cluster }}"
    folder: "{{ vm_folder }}"
    template: "{{ vm_template }}" 
    name: "{{ item }}"
    hardware:
      memory_mb: 6144
      num_cpus: 2
      num_cpu_cores_per_socket: 2
    networks:
    - name: "{{ vm_network }}"
    customization_spec: "{{ vm_customization_spec }}"
    wait_for_customization: yes
  with_items: "{{ deploylist }}"

Next we tell the role to invoke our playbook. This is slightly overkill for a role with just one actual task, but it’s nice to build a habit of keeping things tidy.

---
- include: deploy-vm.yml

Getting it all to run

Finally it’s time to create a master playbook to trigger the role (and potentially others):

---
- hosts: localhost 
  any_errors_fatal: true

  roles:
  - vm-deployment

To execute it all, we’ll use the ansible-playbook command:

ansible-playbook deploy-webserver.yml -i inventories/test --ask-vault-pass

After responding with the appropriate vault passphrase, Ansible goes to work, and in a couple of minutes a brand new virtual machine is ready to take on new roles.

ZFS backups in Proxmox

I’ve been experimenting with using ZFS snapshots for on- and off-site backups of my Proxmox virtualization environment. For now I’m leaning towards using pve-zsync for backing up my bigger but non-critical machines, and then using syncoid to achieve incremental pull backups off-site. After the initial seed – which I perform over a LAN link – only block-level changes need to be transferred, which a regular home connection at a synchronous 100 Mbps should be more than capable of handling.

One limitation in pve-zsync I stumbled upon is that it will trip itself up if a VM has multiple disks stored on different ZFS pools. One of my machines was configured to have its EFI volume and root filesystem on SSD storage, while the bulk data drive was stored on a mechanical disk. This didn’t work at all, with an error message that wasn’t exactly crystal clear:

# pve-zsync create -source 105 -dest backuppool/zsync -name timemachinedailysync -maxsnap 14
Job --source 105 --name timemachinedailysync got an ERROR!!!
ERROR Message:
COMMAND:
	zfs send -- datapool/vm-105-disk-0@rep_timemachinedailysync_2020-04-05_11:32:01 | zfs recv -F -- backuppool/zsync/vm-105-disk-0
GET ERROR:
	cannot receive new filesystem stream: destination has snapshots (eg. backuppool/zsync/vm-105-disk-0@rep_timemachinedailysync_2020-04-05_11:32:01)
must destroy them to overwrite it

Of course removing the snapshots in question didn’t help at all – but moving all disk images belonging to the machine to a single ZFS pool solved the issue immediately.

The other problem is that while this program is VM aware while backing up, it only performs ZFS snapshots on the actual dataset(s) backing the drive(s) of a VM or container – it doesn’t by itself backup the machine configuration. This means a potentially excellent recovery point objective (RPO), but the recovery time objective (RTO) will suffer as an effect: A critical service won’t get back online until someone creates an appropriate machine and connects the backed up drives.

I will be experimenting with variations of the tools available to me, to see if I can simplify the restore process somewhat.

Moving Proxmox /boot to USB stick

Some short notes I made along the way to benefit the future me.

Background

On my new server, Proxmox was unable to boot directly to a ZFS file system on a drive connected via the HBA controller. UPDATE (2020-01-27): The SuperMicro X10SRH-CLN4F motherboard boots just fine from a root-on-ZFS disk in UEFI mode from the built-in SAS HBA. The only required change is the last step in the description below; to add a delay before attempting to mount ZFS volumes at boot-time.

There is a potential drawback to installing Proxmox in root-on-ZFS mode in a UEFI system: The drive gets partitioned, so ZFS doesn’t get uninhibited access to the entire block storage. This may or may not make a difference for performance, but in terms of speed on an SSD solution, I haven’t really seen any cause for concern for my real-world use case. An alternative would be to install the underlying operating system to a separate physical drive.

Also note that the workaround below works on a single vFAT volume. Since FAT doesn’t support symlinks, kernel or initramfs updates in Proxmox/Debian will require some manual work, which most sane people would likely wish to avoid.

I’m leaving the rest of my article intact for posterity:


My workaround was to place /boot – not the system – on a USB stick connected directly to the motherboard.

Process

After installation, reboot with the Proxmox installation medium, but select Install Proxmox VE (Debug mode).

When the first shell appears, Ctrl+D to have the system load the necessary drivers.

Check the name of the USB drive.

lsblk

Partition it.

cfdisk /dev/sdb

Clear the disk, create an EFI System partition and write the changes. Then apply a FAT to the new partition

mkfs.vfat /dev/sdb1

Prepare to chroot into the installed Proxmox instance

mkdir /media/rescue
zpool import -fR /media/rescue rpool
mount -o bind /dev /media/rescue/dev
mount -o bind /sys /media/rescue/sys
mount -o bind /dev /media/rescue/dev
chroot /media/rescue

Make room for the new /boot

mv /boot /boot.bak

Edit /etc/fstab and add the following:

/dev/sdb1 /boot vfat defaults 0 0

Make the stick bootable

mount -a
grub-install --efi-directory=/boot/efi /dev/sdb
update-grub
grub-mkconfig -o /boot/grub/grub.cfg

Exit the chroot, unmount the ZFS file system (zfs export rpool)and reboot

In my specific case I had a problem where I got stuck in a shell with the ZFS pool not mountable.

/sbin/zpool import -Nf rpool

Exit to continue the boot process. Then edit /etc/default/zfs and edit a delay before attempting to boot the file system.

ZFS_INITRD_PRE_MOUNTROOT_SLEEP=15

Then apply the new configuration:

update-initramfs -u

Head: Meet Wall.

I spent way more time than I’m comfortable disclosing, troubleshooting an issue with an AD-attached Oracle Linux server that wouldn’t accept ssh logons by domain users.

We use the recommended sssd and realmd to ensure AD membership. Everything looked good, and I could log on using an account that’s a member of the Domain Admins group, and so I released the machine to our developers for further work.

Only they couldn’t log on.

After spending most of the morning looking through my logs and config files, and detaching and re-attaching the server to the domain after tweaking various settings, I suddenly saw the light.

Note to my future self:

Windows runs NetBIOS under the hood! Any machine name over 14 characters of length in a domain joined computer will cause trouble!

Naturally, after setting a more Windows-like hostname and re-joining the domain, everything worked as I expected.

Serving static web pages from HAProxy

I recently had to prove ownership of a web domain at work. The suggested process was easy enough: Present a web page with your company name, and a way to send a mail to an address on the given domain. Now we do have a few web services running, but I didn’t want to mess with those. However, most things we present to the internet exist behind a HAProxy pair. That’s kinda-sorta a web server, isn’t it? Could we use its standard behavior to present a web page? Sure we can!

HAProxy has a feature to present custom error messages: It’s simply a hard-coded HTTP stream, so it’s lightning fast to serve, and any browser can interpret it into a web page. Let’s build one just for kicks:

HTTP/1.0 200 Found
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<html>
    <head><!--Just a test.--></head>
    <body>
        <h1>A fancy-schmancy header</h1>
        <p>Hello world!
    </body>
</html>

So how do we present this page? Elementary: We cause an error. Not finding a backend server should trigger a 503, for example, so let’s go with that:

(...)
frontend defaultlistener
(...)
    use_backend bk_mystaticpage if { hdr(Host) -i hostname.mydomain.com }

backend bk_mystaticpage
    mode http
    errorfile 503 /etc/haproxy/errors/testpage.http

See how the backend definition doesn’t point at any servers? Instant 503. Et voilà: Our load balancer is now a rudimentary web server.

Fix for slow TFTP transfers from tftpd-hpa

After setting up a TFTP server based on tftpd-hpa I was disappointed in the transfer speeds I was seeing. A 15 MB file was enough to make the request time out before transfer completed.

The recommendation I found was to increase the maximum block size in the server configuration. However I also found a warning that some network equipment is unable to deal with fragmented packets when loading files over TFTP. The compromise I chose was to set a maximum block size below my network MTU.

...
TFTP_OPTIONS="--secure -B 1468"
...

Et voilà: TFTP with good enough performance for regular use.