Creating a working Ubuntu 18.04 VMware template

Long story short: I use VMware and I use Ubuntu. With Ubuntu 16.04 everything worked nicely out of the box. With Ubuntu 18.04 it doesn’t. I finally got tired of manually setting my hostname and network settings every time I need a new server, and decided to fix my template once and for all.

Networking

The first thing that doesn’t work – as mentioned in an earlier post – is deploy-time configuration of the network based on vCenter templates.

For some weird reason, Ubuntu has chosen to entirely replace the old ifupdown system for configuring the network with a combination of Cloud-init and Netplan. If we choose to download the installation image with the traditional installer, at least we don’t get cloud-init, but Netplan remains.

False start

According to the Netplan FAQ, we can install Ubuntu Server without using Netplan by pressing F6 followed by ‘e’ in the installer boot menu, and adding netcfg/do_not_use_netplan=true to the preseed command line.

Unfortunately this leaves us with a disconnected machine after first boot: It turns out Ubuntu isn’t smart enough to actually install ifupdown if netplan is deselected – at least not using the current installer, 18.04.01.

The working way

The solution to the problem above is still (in February 2019) to perform a clean install with Netplan, and then manually remove open-vm-tools and replace it with VMware’s official tools, since open-vm-tools do not yet support Ubuntu’s weirdness even 10 months after 18.04 was released.

…However…

The default DHCP behavior in Ubuntu 18.04 is nothing other than idiotic for use in VMware templates: Despite newly deployed machines naturally getting new MAC addresses, they insist on asking to be handed the same IP address as their template, and they naturally don’t understand if the lease is already taken but will keep stealing the IP address from each other.

Fortunately, according to this post over at superuser.com, there’s a way to fix this. Edit /etc/netplan/01-netcfg.yaml, and tell Netplan to use the MAC address as the DHCP identifier, like this:

      dhcp4: yes
      dhcp-identifier: mac

After this, new machines deployed from the template should behave slightly more sanely.

Painfully long Grub menu timeout

Grub’s boot menu has a default timeout of 30 seconds in Ubuntu 18.04. The relevant setting is apparently modifiable in /etc/default/grub. Only it isn’t. The default value for GRUB_TIMEOUT is 2 seconds, which it doesn’t adhere to at all. Logically (no, not at all), the “fix” is to add the following line to /etc/default/grub:

GRUB_RECORDFAIL_TIMEOUT=2

Re-run update-grub with superuser rights, and reboot the computer to confirm it worked as intended.

End result

With the changes detailed above, and after installing Python to allow Ansible to perform its magic on VMs deployed from this template, I finally have reached feature parity with my Ubuntu 16.04 template.

Fixing lack of console video in Proxmox on HP MicroServer Gen7

After my latest experiment I encountered an issue where the current lack of video output from Proxmox on my N54L-based HP Microserver Gen 7 became a serious issue: I would see the Grub menu, then the screen would turn blank and enter power-save mode, I’d see the disk activity light blink a few times, but the system wouldn’t start up.

Naturally, without seeing the error message I couldn’t do anything about the issue, but I had seen a similar symptom earlier, namely when installing Proxmox for the first time: The installer USB image behaves the same way on this computer, and the workaround there is simply to press Enter to enter the graphical install environment after which the screen is visible.

This time I re-created a Proxmox 5.2 USB stick, booted the server from it, and correctly assumed that arrow down followed by Enter would likely get me into some sort of rescue environment. Sure enough I was soon greeted by a root prompt. At this stage, the ZFS modules weren’t loaded, so again I guessed that pressing Ctrl+D to exit the rescue environment would start the install environment where I know ZFS is available, and pressing the Abort button there luckily got me back to a shell.

From here I mounted my ZFS environment and chrooted into it:

# zpool import -f -a -N -R /mnt
# zfs mount rpool/ROOT/pve-1
# zfs mount -a
# mount --rbind /dev /mnt/dev
# mount --rbind /proc /mnt/proc
# mount --rbind /sys /mnt/sys
# chroot /mnt /bin/bash --login

I could now confirm that I was within my regular Proxmox file system, and so I got to work on the Grub configuration:

In /etc/default/grub, I found the commented out line #GRUB_GFXMODE=640×480 and changed that portion of the file so it now looks like this:

GRUB_GFXPAYLOAD_LINUX="keep"
GRUB_GFXMODE=1024x768
GRUB_CMDLINE_LINUX_DEFAULT="nomodeset"

I then ran update-grub and rebooted the server, after which I could see the boot process including the issue that prevented the system from booting fully: It turns out my ZFS pool didn’t want to mount after replacing the drive. I quickly ran zpool import -f, and exited the shell, and then the system successfully booted the rest of the way. An additional reboot confirmed that the system was functional.

Summary

Troubleshooting gets a lot harder when you’re blind. The solution is to attempt to become less blind.