Upgrades, people, upgrades

Upon upgrading the RAM and populating an additional NVMe slot in all of my Proxmox nodes, which are a mix of HP Elitedesk and Prodesk SFF machines, I had a bit of a snafu. One of the nodes had a dead CMOS battery and was quite unhappy with losing power. While replacing the battery and resetting the clock was something I expected, I hit another error: Failed secure boot verification. What in the world? Was the new NVMe drive higher in priority and failing boot because it was empty? Some quick mucking around in the UEFI settings determined that was not the case – we were failing to secure boot on the original boot drive.

Hetero(geneity) strikes again

Unfortunately, this node was configured to use proxmox-boot-tool, which I discovered while trying to rescue Grub. That made sense, it booted into a different environment than the familiar blue grub of Proxmox, but I didn’t think much of that as usually these nodes are totally headless and I never see the bootloader.

grub install is disabled because this system is booted via proxmox-boot-tool

Okay.. Let’s fix this. Maybe we should verify the EFI signature along the way? Let’s see where exactly it is in the EFI partition and then verify that it’s signed by our friends in Redmond.

root@tribune3:~# ls /boot/efi -R 
/boot/efi:
EFI loader 

/boot/efi/EFI:
BOOT Linux proxmox systemd

/boot/efi/EFI/BOOT:
BOOTX64.EFI

/boot/efi/EFI/Linux:

/boot/efi/EFI/proxmox:
6.8.12-4-pve

/boot/efi/EFI/proxmox/6.8.12-4-pve:
initrd.img-6.8.12-4-pve vmlinuz-6.8.12-4-pve 

/boot/efi/EFI/systemd:
systemd-bootx64.efi

/boot/efi/loader:
entries entries.srel loader.conf random-seed

/boot/efi/loader/entries:
proxmox-6.8.12-4-pve.conf

Ugh. Where are the shim files? I’m looking for something like shimx64.efi These files are definitely present on my other nodes. What is going on?

(Hint: it was proxmox-boot-tool)

proxmox-boot-tool, grub edition

We want a proper grub install here instead of using proxmox-boot-tool. The key step here is the grub part of the init command. I don’t think formatting is specifically necessary, but I was fine wiping this EFI partition, and al the data I care about is replicated to the other nodes in the cluster.

proxmox-boot-tool format /dev/nvme1n1p2
proxmox-boot-tool init /dev/nvme1n1p2 grub
proxmox-boot-tool refresh

Oh look, a lovely EFI directory full of grubby goodness. Let’s verify the shim now.

root@tribune3:~# sbverify --list /boot/efi/EFI/proxmox/shimx64.efi | grep "Microsoft Corporation UEFI CA 2011"
 - /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011
   issuer:  /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011
 - subject: /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011

Okay, nice. If anything nefarious was going on, MSFT is now in on it too. I can now re-enable secure boot and move on with my cluster upgrade.

Why did Secure Boot work before?

I can only imagine, that in my infinite wisdom and penchance for getting things working, that I enrolled user keys instead of the MSFT UEFI shim. We can see in the first ls of the EFI partition that we’re using systemd-boot

/boot/efi/EFI/systemd/systemd-bootx64.efi

It’s impossible to know at this point who this image was signed by, but I likely had enrolled a Machine Owner Key (MOK). When the system lost power and experienced a CMOS reset, goodbye key.

Bonus content

Proxmox HA will migrate guests back to their preferred node based on HA groups. So imagine my surprise when I see this error:

TASK ERROR: Installed QEMU version '9.0.2' is too old to run machine type 'pc-i440fx-9.2+pve0', please upgrade node 'tribune3'

What in the hell. Why? These guests were literally just running on this machine before the hardware upgrade.

Lucky for me, I don’t care.

apt update
apt full-upgrade

Before this command even finished, QEMU was happy and the guest had migrated to it’s preferred home on tribune3.