-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ostree rollbacks with alternate systemd boot target (with read-only composefs /etc). How do we detect rollbacks reliably? #3115
Comments
So the systemd side of this is not so tricky:
Tested this, it seems to work fine, just need the reliable detector that we have rolled back (or not), in order to create a conditional. |
My initial thoughts are to parse:
with either a python json parser or jq, and if it's not the head version assume it's a rollback. |
So what I am thinking of doing:
/ostree/root.expected is a new symlink I will add. If root.expected doesn't match the androidboot.slot_suffix we can tell we rolled back. I'd like to use something more generic baked into rpm-ostree/libostree, but the CLIs etc. are too slow for this check, this really has to be a handful of milliseconds thing. This will only work on this platform but at present it's just one Automotive company requesting this. |
It might be worth clarifying what rollback means in this context. Also, definitely worth looking at https://github.com/fedora-iot/greenboot/ if you haven't already. |
(I think it's better instead to change the default target from a generator...i.e. the rollback detection happens in a generator. It's a bit cleaner.) |
This topic heavily relates to #3032 - if we start logging metadata as to whether a deployment was ever booted, we can key of that more reliably. Even without metadata today we could add
Otherwise:
(Of course all of this deeply ties into having a JSON or other API for ostree, like both bootc and rpm-ostree has) |
I am not sure how an ostree CLI invocation would be a slow point of this flow; I can't imagine |
@jlebon a rollback in this context is, we updated but we haven't booted into that version because it's unhealthy for some reason. We are using greenboot to at least mark a slot as healthy, this user wants to rollback but with a caveat, they want in this case to boot into an alternate systemd boot target. |
I'd really like to not add things that are specific just to androidboot or other bootloaders; I think a key goal of ostree here is to abstract over these things. We definitely want functionality similar to this across many OS variants and footprints.
|
@cgwalters we can do that in a generator (it will be my first generator I have written 😄 ), but is the /etc writable at that point with an overlay (this thing will have read-only /etc)? Note the |
The CLI's are not slow for normal usage, but when you want to use them in the boot sequence and the aim is a 2-second boot, some of the options are not fast enough. The json output for example is parsable, but too slow.
That's not to say we couldn't make How about this for an idea, instead of my previous proposal, we do a:
symlink on each deployment, that way we have a generic fast thing to check? |
One of the things I like about not using a generator, is I can keep this stuff out of the initrd/initoverlayfs (although initoverlayfs doesn't really have a boot cost), I like to do stuff in the rootfs if at all possible. |
Generators don't write to |
Taking a quick read around, this is the precedence order: /usr/lib/systemd/system/ the /etc one takes precedence over /run and /run takes precedence over /usr. Given the symlinks are set up like this by default:
even if we generated the symlink in /run wouldn't the /etc one take precedence anyway, make the /run symlink redundant? Of course we could just remove the /etc symlinks altogether to resolve this. |
But to take this in a step by step process, I'm gonna look into creating a:
type of symlink on all deploys as a quick check to see if we have rolled back or not in future boots. If this sounds ok? After this I guess look into:
after as step 2, that leaves us open to doing in the initrd, initoverlayfs or rootfs, plus Then step 3 is to write the generator (or equivalent solution). step 4 is make it configurable I guess? Or maybe even just have the generator/systemd service file as a separate rpm to solve that problem. |
I don't understand what value |
Yeah it's just "deployment 0". I could parse BLS files, if there's another fast way of detecting the last deployed version of the operating system, very open to ideas. |
So in this case:
BLS parsing functionality must be added to this binary, in order to detect if we are booting deployment 0 or not early boot. |
Oh sorry, you want to parse this stuff from the initramfs. OK yes, all that code today is in libostree, which we aren't linking to from the initramfs code, because among other things it would drag in HTTP libraries. Also to do this, we'd have to mount the My main concern here is just to keep the BLS entries as source of truth for system state, and a symlink is extra persistent state.
Right, in an image-based world it is definitely better to have the default target be in That said, it is possible for generators to override; from
|
But going back to the generator point...I think given that we can change the default target via a generator in the real root, there's no need to do this in |
I am open to doing this from the rootfs (I would have preferred this because rootfs allows you to use more generic ostree stuff). But if you do it in a generator, you are forced to do this in initrd, because the generators run so early?
I don't fully understand this approach, only the client knows what the latest deployed version is. If you are using Android Boot Image or UKI the client side cannot manipulate any karg, because it breaks the signature. I may be misinterpreting this approach though. An initrd can't be altered client-side either, for the same reason.
I've never written a generator before so again, apologies if I'm misinterpreting the use for them. They kinda seem inflexible because they run so early. And I don't fully understand why we need the generator dynamic functionality, because you can do the same thing in a simple systemd service file but you have the choice of putting it anywhere in the boot sequence.
Yeah this makes sense to me, the /etc symlinks should probably go away by default in an image based world. I think this is just something inherited by the ostree family of distros, because these default.target symlinks are in /etc in non-ostree distros.
|
I think that's mainly because Anaconda always writes it, which it should stop doing. |
Yes, I believe your unit will generally work fine. The main downside is that some units may still be launched from the default target that wouldn't be if we'd explicitly started the desired target from very early on. Anyways so I think we're iterating towards just having a raw ostree API like Actually, to simplify this even more, we can change |
This means Anyway I'll start trying to take a stab at |
If it's as slow as this command, it's not gonna fly in our boot stack, but maybe it's the parts we don't need in this command that are slow:
|
That's very expensive because it involves gpg verification, the bulk cost of which is parsing all the keys in We definitely don't need to do it for this case. I also discovered a really longstanding bug when digging into this #3131 And I just pushed a commit there; try out |
Hmm in this I keep having to refresh my knowledge on androidboot...can you write up something in I find myself a bit confused now as to what actually writes the |
It's the https://gitlab.com/CentOS/automotive/rpms/aboot-deploy on the client-side, that does this. https://gitlab.com/CentOS/automotive/rpms/aboot-update creates the Android Boot Image repo/server -side. I will write start a hackmd on this. |
As requested @cgwalters: https://hackmd.io/@7wAdmxHWRI6dhoDAwOtrpw/ByAeouBua we could go into more detail if required. The intent is to put this under the |
We can't do this in the generator as |
Can you DM me a writable link to that hackmd? |
So I created this as a reference rpm that can be included in an image build to turn on this feature. https://github.com/ericcurtin/ostree-rollback-to-rescue Maybe it's best to leave this as a reference rpm for people to base their own packages on. Or we could integrate in Fedora, CentOS Automotive repos directly somehow. In Automotive for example, it's possible that someone creates their own target and UI for rescue mode. We could regard this issue closed at this point. |
On completion of this PR, we can close this issue: #3171 |
We have a feature request that when a rollback is triggered, instead of just rolling back, we want to rollback but boot with an alternate systemd target. This may not have to be directly integrated into the ostree project.
There are typically two ways of doing this, setting a
systemd.default=
karg or by setting up a default.target symlink in one of:/usr/lib/systemd/system/
/run/systemd/system/
/etc/systemd/system/
directories.
By setting up this default.target symlink early boot, we think we can achieve this.
But in order to do this early boot we need a reliable, (ideally bootloader agnostic we are not using grub) way of detecting the rollback problematically, so what we are unsure of is how do we detect this reliably early boot?
Tagging @alexlarsson for visibility
The text was updated successfully, but these errors were encountered: