-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support UKI #2753
Comments
How would we know from the initramfs when a rollback was performed? Thinking on the use scenario in which the bootloader decides to rollback as the new deployment/update is not good enough (wasn't confirmed), as currently this can be done quite easily by booting the previous deployment (previous initramfs), which also has the previous ostree argument in place. |
What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me. I'm pretty sure sd-boot deals with it fine as we've been using it with a UKI on a product for a few years. The only changes we have to make are horrifying ones to deal with the lack of symlinks on VFAT (#1719). I guess using UEFI boot variables would side step that issue, though. |
Also, if you're building a UKI, the initramfs is part of it and there's no need for ostree to find it. Are you suggesting the ostree take the separate kernel and initramfs and generate a UKI? |
Is the "how to find the rootfs" problem, the chicken in the egg problem in that you need to populate the "ostree=" karg in the UKI, but you only know what that karg should be after you commit? So that you can only boot the n-1 commit in that case? I had thoughts on this, you could write an extra value client side, maybe as a new entry "ostree" in the bls. So you could do:
But I'm happy with whatever works :) Encountering the same issue in the UKI-like aboot bootloader. |
I think the main reason to embed the rootfs in the kernel cmdline is basically integration with bootloader menus - e.g. to be able to choose the previous deployment in the GRUB GUI. However, this is not a requirement. We could instead read a value from the target root, one could imagine something as simple as a symlink Perhaps a strawman here is that specifying a bare And then to boot the previous deployment, we support an |
By default, we don't do client side commits. Hence, the digest is actually fully predictable and known in advance on the build server. But certainly there is a circular dependency here for any systems which are doing fully sealed kernel command lines - we'd need to generate the rootfs and compute its digest, then patch the kernel binary (which would in theory invalidate that digest, but OTOH nothing actually reads the kernel from the rootfs; the fallout would just be things like Arguably perhaps, we should have better support on the client for something like "ghosting" the kernel/initramfs from |
I think many users/organizations that want to deploy UKIs will want to do so without involving any bootloader at all. But yes, we should probably also support deployment with a bootloader. |
That does make sense. But what's actually unpacking the UKI in that case? Some other UEFI program? I don't believe the linux kernel itself supports booting directly from a combined kernel+initramfs PE program. Ah, sd-stub. I missed that. I guess if you're all in on UEFI and want the minimal boot environment, then even sd-boot is superfluous. |
It depends on the use-case and the hardware mostly. Adding new bootloader entries to the EFI menu works only so-so on some hardware/firmware and often switching b/w different entries on boot is cumbersome (not to mention vendor-specific). Thus, having different boot environments with different kernels/deployments would be much easier with sd-boot than with "native" UEFI boot loading. This is the reason why I use sd-boot on all my systems in combination with UKIs. |
One thing that's not clear to me is how do we deliver a UKI (is it it's own rpm?), because it would be built on the osbuild-side rather than the end device... |
Sure, the same way you build the kernel and initramfs on an ostree system. They're just bundled together for a UKI. I think the only thing in there that doesn't fit that model is the kernel command line since ostree currently allows you to manage that locally and it often contains |
I guess the way we do it right now at Endless is that the initramfs is generated in our ostree builder. For our systems that use a unified kernel with sd-boot, that's also generated in our ostree builder. There's no reason they couldn't be packaged except that generating the initramfs requires installing all the dracut modules that you want in there. We decided that wasn't worth the effort and it was easier to do that in the ostree builder since it would by definition have all the modules installed. |
Just to be explicity, i didn't propose just shipping the commit in the detached metadata, as that is not trusted. What i proposed it to ship some public key in the initrd, sign the commit id with the private part, store it in detached metadata, and then throw away the private key. Then the initrd can validate the commit it reads from somewhere. |
Posting here the result of several discussions that we've had recently: The major change is the need to move the ostree deployment hash out of the kernel command line as the kernel command won't be modifiable in the UKI case. The suggested design is that ostree would take the UKI from the ostree commit, move it to the EFI partition and rename it with the following convention:
For example: We would then need to add support in the initramfs to read the ostree deployment hash from the name of the UKI that has been booted instead of reading it from the kernel command line. This could be done either by reading the name from EFI variables or from the TPM event log.
While this would be nice, I don't think it's strictly needed if we still have a bootloader (systemd-boot preferably) that is capable of booting BLS config entries. |
The design above could be combined with the suggestion from #2753 (comment) and the use of composefs to verify the content of the deployment. |
Sorry I deleted that as I wanted to rewrite a portion, it belongs before @alexlarsson 's comment So some automotive folk were discussing Android boot images, which are similar to UKIs in that it is a "kernel, initrd, cmdline and signature" that gets generated server-side and delivered to the client via ostree. This leave an issue of how do you deliver and boot the ostree SHA. It is difficult to boot via karg because that has the recursive problem, how do you deliver that SHA without altering the SHA? @alexlarsson suggested ostree detached metadata, that way you can deliver the SHA without altering the SHA, so we think this should solve the problem. But this requires booting via an alternate means to booting via karg, and I will explore the symlink techniques @cgwalters suggested above. Wondering what you guys think of this as a proposal? A similar technique could be used for UKIs and Android Boot Images. |
@travier thanks for sharing the output of the discussions here, in the case where you don't have an EFI partition available which is the case in Android Boot Images, do you think it's reasonable to move forward with @cgwalters symlink approach suggested above? |
The UEFI variable Reading this thread, I can't help but feel like this is getting over engineered (or rather "complex") ... I'm personally not interested in building the UKI on the server and loosing the ability to specify command line arguments, however I think that's a requirement if you want the UKI to be signed e.g. by the distribution itself ? Since that isn't my goal, I'm currently building the UKI on the host, supporting kernel arguments, If I need the image on the build server, e.g. for signing or attestation, I can simply take the kernel arguments and build the (fully reproducible) UKI. |
The problem with this is that it moves the indentifier for the rootfs from a trusted location (in signed uki) to a completely untrusted location (the filename). Anyone can just rename the FAT file and make it boot some other rootfs. This is fine if you don't care about validationg, but it is nowhere enough for a secureboot trusted boot into the rootfs. |
Not any other rootfs; you'd include the key used to sign the composefs in the initramfs, and validate it from there. So the problem then turns to rollback protection, and that's a nuanced topic because it's absolutely valid to want to roll back sometimes. |
I liked the symlink approach over EFI partition. The problem with using EFI features, is you start to depend on fully implemented UEFI, which would be nice, but it's not always the case, especially on non-x86 systems. If we could self-contain the solution as much as we can in the main rootfs partition it would be better (over using EFI partitions). |
To be clear "symlink approach" = #2753 (comment) ? I edited that earlier comment to elaborate a bit about how rollbacks would work; so the previous bootloader entries would gain |
Yes, it removes the hard dependency on EFI.
|
We need a way to choose which deployment to boot as we need to support rollbacks (rollback protection is another topic that we are not covering here and would be implemented separately). As we can not change the command line, we need a way to pass that info to the initramfs. Using the filename of the UKI is one way of doing that. Note that this deployment hash isn't particularly trusted data: it only makes sense if the deployment exists in the rootfs. Whether or not it's a valid deployment is thus a question of whether or not we have integrity for the rootfs and that's a composefs / LUKS discussion. You can not use that to boot an arbitrary deployment that would not be in the rootfs already. |
@travier what problems do you see with #2753 (comment) ? |
As far as I understand this involves modifying the kernel command line which is not compatible with UKIs. |
If we do a mapping
|
Not sure how robust this would be in case of power failures as we would need to update two places at the same time every time we do a new deployment: UKI file name + deployment hash symlink. |
But we don't change the UKI for every deployment. We don't want to have to touch the kernel config when only userspace changes in general, right? |
For systemd/systemd#24539 to work we need BLS Type 1 entries (config files) and bootloader support to extend the UKI kernel command line with the options passed into that config file. |
How do you set that in the kernel command line and how do you update that when you change the order of deployments? |
We could generate a random hash and include it both in the UKI kernel command line and setup the symlinks in the rootfs but that would be another indirection like I mentioned in #2753 (comment). |
You're right, I wasn't covering a detail here. At this point though the thread is unwieldy, so I've amended the initial comment here. I think systemd-stub credentials are already a way to pass this data and it's what it's designed for. That said, I also do think we can't design solely for systemd-stub. A very interesting case that's entwined with all of this is whether systems using ostree want to explicitly support locally-initiated rollback. If you don't (and I think that's valid!) then there's no need for a "fallback" UKI that would appear as a separate bootable entry at all. Instead, it'd be up to userspace (whether initramfs or real root) to verify health and locally initiate a change in the default UKI/rootfs pair. |
Using credentials is indeed also an option. Note that this requires |
Looks like this won't let us share UKI if I understand correctly. |
I'll post my current test setup here, simply because it might be useful to someone, obviously it won't be usable for the use case discussed here (Firmware SecureBoot, i.e. with Microsofts keys).
Instead of doing the Boot-entry dance using Here is the #!/bin/sh
set -eu
if [ "$1" != "-o" ]; then
echo "Usage: $0 -o <cfg>"
exit 1
fi
if [ -z "$2" ]; then
echo "Usage: $0 -o <cfg>"
exit 1
fi
# FIXME: assert, that _OSTREE_GRUB2_IS_EFI is not set, if it has been set, then
# ostree will use different logic, which is probably incompatible.
# FIXME: replace by using _OSTREE_GRUB2_BOOTVERSION, which also checks that we have been called by ostree
# We get called like `grub-mkconfig -o /boot/loader.0/grub.cfg`, use $2 to obtain the /boot/loader.$bootnum directory
if [ "$2" = "/boot/loader.0/grub.cfg" ]; then
OLD_BOOTNUM="1"
NEW_BOOTNUM="0"
elif [ "$2" = "/boot/loader.1/grub.cfg" ]; then
OLD_BOOTNUM="0"
NEW_BOOTNUM="1"
else
echo "Usage: $0 -o /boot/loader.[01]/grub.cfg"
exit 3
fi
LOADER_DIR="$(dirname "$2")"
if [ -d "$LOADER_DIR/uki" ]; then
# Might be a left over from e.g. a failed previous run.
echo "Removing (old) $LOADER_DIR/uki"
rm -r "$LOADER_DIR/uki"
fi
mkdir "$LOADER_DIR/uki"
for entry_file in "$LOADER_DIR"/entries/*.conf; do
echo "Parsing BLS entry file '$entry_file':"
# 1. Parse the BLS configfile:
ENTRY_TITLE="$(grep "^title " "$entry_file" | sed 's/^title //')"
ENTRY_VERSION="$(grep "^version " "$entry_file" | sed 's/^version //')"
ENTRY_OPTIONS="$(grep "^options " "$entry_file" | sed 's/^options //')"
ENTRY_LINUX="$(grep "^linux " "$entry_file" | sed 's/^linux //')"
ENTRY_INITRD="$(grep "^initrd " "$entry_file" | sed 's/^initrd //')"
# Technically the 'version' is supposed to be sorted using debian version sort style, but we assume
# that the filenames generated by ostree are enough for ordering, which will probably break once you have 9+ deployments
ENTRY_FILENAME="${entry_file##*/}"
UKI_PATH="$LOADER_DIR/uki/${ENTRY_FILENAME%.conf}.efi"
echo "Resulting UKI will be stored in '$UKI_PATH'"
echo "$ENTRY_OPTIONS" > "$UKI_PATH.cmdline"
# Build the actual UKI, note that it is always rebuild / shouldn't exist yet
# --preserve-dates: For a reproducible timestamp in the PEI header
objcopy \
--preserve-dates \
--add-section .cmdline="$UKI_PATH.cmdline" --change-section-vma .cmdline=0x30000 \
--add-section .linux="/boot/$ENTRY_LINUX" --change-section-vma .linux=0x2000000 \
--add-section .initrd="/boot/$ENTRY_INITRD" --change-section-vma .initrd=0x3000000 \
/usr/lib/systemd/boot/efi/linuxx64.efi.stub \
"$UKI_PATH"
done
# Sync build images to /boot/efi
# See also <https://bugzilla.gnome.org/show_bug.cgi?id=724246>
ESP_DIR="/boot/efi/EFI/bauen1-uki"
mkdir -p "$ESP_DIR.0" "$ESP_DIR.1"
sync --file-system "/boot/efi/EFI"
echo "OLD_BOOTNUM: $OLD_BOOTNUM"
echo "NEW_BOOTNUM: $NEW_BOOTNUM"
# We assume, that the currently used Boot variables point to "$ESP_DIR.$OLD_BOOTNUM", so we can safely
# remove "$ESP_DIR.$NEW_BOOTNUM"
# Figure out some values for modifiny UEFI Boot variables:
ESP_DEVICE="$(df /boot/efi | tail -1 | awk '{ print $1 }')"
ESP_PARTNUM="$(cat /sys/class/block/"$(basename "$ESP_DEVICE")"/partition)"
ESP_PARTUUID="$(blkid "$ESP_DEVICE" -o export | awk -F'=' '/PARTUUID=/ { print $2 }' )"
echo "device=$ESP_DEVICE partnum=$ESP_PARTNUM partuuid=$ESP_PARTUUID"
cleanup_bootvars() {
# Removes any boot variables referencing a certain $ESP_DIR.$BOOTNUM
# $1: bootnum
# Now we know that we are looking for something similar to:
# HD($ESP_PARTNUM,GPT,$ESP_PARTUUID,somehex,somehex)/File(\EFI\bauen1-uki.$BOOTNUM\.*)
# efibootmgr outputs like:
# BootXXXX* title with possible spaces\tActualEntry
ENTRIES="$(efibootmgr -v | grep -E '^Boot[[:xdigit:]]{4}' | awk -F'\t' '/^[^\t]+\tHD\('"$ESP_PARTNUM,GPT,$ESP_PARTUUID"',.*\)\/File\(\\EFI\\bauen1-uki.'"$1"'\\.*\)$/ { print $0 }')"
printf "Boot entries that will be removed:\n%s\n" "$ENTRIES"
for entry in $(echo "$ENTRIES" | grep -E '^Boot[[:xdigit:]]{4}' --only-matching | sed 's/^Boot//'); do
echo "Removing $entry"
efibootmgr --delete-bootnum --bootnum "$entry"
done
}
# 1. Cleanup any left over Boot variables still pointing to $ESP_DIR.$NEW_BOOTNUM
cleanup_bootvars "$NEW_BOOTNUM"
# 2. Cleanup $ESP_DIR.$NEW_BOOTNUM
if [ -e "$ESP_DIR.$NEW_BOOTNUM" ]; then
echo "Removing $ESP_DIR.$NEW_BOOTNUM"
rm -r "$ESP_DIR.$NEW_BOOTNUM"
sync --file-system "/boot/efi/EFI"
else
echo "Skipping removal of $ESP_DIR.$NEW_BOOTNUM, does not exist"
fi
# 3. Create new $ESP_DIR.$NEW_BOOTNUM
echo "Creating $ESP_DIR.$NEW_BOOTNUM"
mkdir "$ESP_DIR.$NEW_BOOTNUM"
cp -v "$LOADER_DIR/uki"/*.efi "$ESP_DIR.$NEW_BOOTNUM"/
sync --file-system "/boot/efi/EFI"
# 4. Create new Boot variables
for f in "$ESP_DIR.$NEW_BOOTNUM"/*; do
echo "Creating Boot entry for file '$f':"
efibootmgr \
--create \
--disk="$ESP_DEVICE" \
--part="$ESP_PARTNUM" \
--label="${f##*/}" \
--loader="${f##/boot/efi}"
done
# 5. Set BootOrder (and maybe BootNext ?)
# FIXME: efibootmgr --create adds the entries to the currently defined BootOrder, however I need to verify
# what order is used, and if that is already what is necessaery
# It appears to already do everything correctly.
# 6. Remove now unused old Boot variables
cleanup_bootvars "$OLD_BOOTNUM"
# Finally actually touch the output file to make ostree happy
echo "Touching empty (fake) output file '$2'"
touch "$2" |
UKI are not supported on rpm-ostree based Fedora variants so let's use recommend for binutils for now to let those not include the package until needed. See: coreos/fedora-coreos-tracker#1496 See: ostreedev/ostree#2753 See: https://src.fedoraproject.org/rpms/kexec-tools/c/ea7be0608ed719cc1cb134ecf6ef51a4b7e9f104?branch=rawhide
Btw for the Android Boot Image implementation this is what we did (it's high level design is very similar to UKIs). UKIs aren't designed to have as malleable a cmdline as a BLS file locally client-side, so we set ostree karg to simply: ostree=true Then we created symlinks like: /ostree/root.a which pointed to two different sysroots (the ostree systemd generator parsed the osname/stateroot from this symlink also). |
So it looks like we have 3 options:
So it looks like the only option in the end is to write the UKI in |
@travier I may have missed something in discussions but why was the easiest "manage UEFI entries directly and avoid the need to have a bootloader at all" option abandoned? This is pretty much what we do for Fedora UKI image (see https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2). The automation is done via kernel-bootcfg tool (part of 'virt-firmware' package) and we do A/B booting upon UKI upgrade. If the new UKI boots, it becomes the default.
|
I've had these discussion with systemd people before. In an ideal world everybody would be UEFI, sadly we don't live in an ideal world. UEFI on ARM is uncommon, even some x86 platforms don't use UEFI still, there was even a new x86 device being hacked on recently that ran non-UEFI slimboot. Chainloading can be an option sometimes but not always. Lets say you are in Automotive and need to hit a strict 2 second boot KPI, you simply cannot chainload in this case because of boot time KPIs. Maybe chainloading is the solution....
|
It's definitely an option. The biggest source of complexity is how it interacts with the rootfs which was touched on above. For ostree we do A/B style of the (kernel, rootfs) pair, not just the kernel/initramfs. |
For RHEL/Fedora, distro-shipped UKI now comes in 'kernel-uki-virt' RPM and it places the UKI to /lib/modules//vmlinuz-virt.efi. This is part of rootfs but it can't be booted from there directly. kernel-install scripts have to copy it to the ESP (in theory - anywhere but BLS suggests /EFI/Linux/). So basically, to switch from A to B for the kernel+rootfs pair you will need to:
Looking at the alternatives to the 'direct UKI boot without a bootloader', I don't think there's going to be a big difference. E.g. if GRUB can chainload it from /boot, someone still needs to place it there, sd-boot can only load binaries from ESP AFAIU and so on. So one way or another, we still need to do some extra actions when switching from 'rootfs+UKI A' to 'rootfs+UKI B'. ESP can probably be treated as implementation detail, same as UEFI variables in NVRAM). |
UKIs are PE (Windows Portable Executable) files and require either UEFI or a bootloader capable of chainloading EFI (PE) binaries. Android Boot Image is a completely different format and I don't think we should conflate support for it into this issue. It should be its own issue as it's likely going to require a different implementation.
This is indeed an option. I however requires a lot more code changes to ostree as it would mean integrating EFI boot entry management logic. |
The downside of going full firmware is that you no longer get an easily accessible option to rollback on boot. You have to enter the firmware interface and find the previous boot entry. |
@travier Yes, and not every firmware will give you the menu. So if switching from A to B manually during boot time is a must, then we will need to inject something like sd-boot in the chain. For RHEL/Fedora CVMs we decided that it's not and 'kernel-bootcfg' does automatic A/B switch: the newly installed UKI (would be UKI+rootfs in your case) is set as BootNext and if it boots successfully, then BootOrder is changed. This covers the most important use-case why someone would want to have a boot menu: the newly installed UKI does not boot. There are some corner cases of course, e.g. the newly installed UKI boots and pretends to work but e.g. networking is broken. |
See https://github.com/uapi-group/specifications/blob/main/specs/unified_kernel_image.md
and
https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1
There are two major points here:
UEFI only
We'll need to add a UEFI backend to ostree, which explicitly controls the UEFI boot ordering via e.g.
efibootmgr
instead of using the/boot/loader/entries
stuff.Kernel cmdline ➡️ rootfs
One goal of the UKI work is to have generic Linux distributions sign both the kernel and initramfs and stock kernel cmdline. However, ostree today embeds the target rootfs in the kernel cmdline - this creates a recursion issue.
Option: ostree=N and symlinks and using systemd-stub credentials
We can change
ostree-prepare-root
in the initramfs to automatically find the latest symlink in/sysroot/ostree
- we effectively do almost this with/ostree/boot.[01]
today.(Something to debate here is whether we require an
ostree=
karg at all; our initramfs code is conservative today in making ostree opt-in, but for people who are requiring it, we could also just add a flag to default it to on, finding the latest deployment)The interesting thing here is what it looks like to fetch a userspace only update.
That flow would look like this:
ostree admin upgrade
orbootc update
or whatever, fetch new rootfs but not a new kernel UKIOption: Parsing the UKI filename
See #2753 (comment)
The text was updated successfully, but these errors were encountered: