-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle re-invocations of coreos-installer on a new disk #976
Comments
I'm not sure this can effectively co-exist with the multipath logic, and it also splits the boot handling logic in "coreos-installer provisioned nodes" vs everything else. Additionally, we do have other initrd units depending on the "boot" label. Do we have an alternative way of pinning the boot partition on first boot, but still retaining auto-discovery capabilities? Alternatively, should we instead fail the boot early on if we detect ambiguities in partitions/labels? |
I think also it doesn't really help the case where the stale boot device still has precedence in the boot order. Ideally, we would want to catch this, which would involve dynamic code.
That was going to be my suggestion as well. Our stance so far has been that we own Realistically this will mostly be a bare metal thing, so it could make sense to also have coreos-installer look for this case and print out a clear warning to not waste users' time in debugging this (we could e.g. suggest the |
There's another possibility here, which is that we change the default (i.e. cosa-generated) disk images to use a unique-per-build (could be random to start, but maybe hashed from the build version number or so) UUID in both the grub config and the kernel cmdline. If we did that then we'd need to change our firstboot uuid logic to either stop doing This would at least mitigate things for the case of doing an install at version X, then an install of version Y != X on a separate disk.
Yeah, that would also help a lot I think. |
Though of course this whole |
Inspired by: coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
One more data point from one of the related RHCOS cases, (it's still unclear how but) the two disks ended up in this situation:
|
One easy way this can happen is if it's actually multipath, but multipathd isn't configured. Or was multipath already ruled out in that case? |
We discussed this in the community meeting today. We did not reach any clear decision, but there was lots of great technical discussion (see log). Towards the end we were oscillating around the proposal of:
Will continue the discussion in this ticket and also in coming meetings if necesssary. |
…artition Inspired by: coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
Inspired by: coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
Inspired by: coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
Inspired by: coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
Was chatting about this with Luca in IRC. One idea here is to have If there's only one boot partition, then "claim" it by adding the rootfs UUID in e.g. But I think we also need to be careful about the real root mounting boot. In the initramfs, we don't touch boot at all after the first boot, and it's always possible that a new device is plugged in after first boot. One thing we can easily do there is move the |
…artition Inspired by: coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
This makes the mount more resistant to other filesystems labeled `boot` that may get plugged in at any point. Part of: coreos/fedora-coreos-tracker#976
coreos/fedora-coreos-config#1256 handles the mount by UUID bit in the real root. I think this makes sense to do regardless of whether we do the stamp file bit proposed above. |
This is analogous to `systemd-fstab-generator` parsing `root=` kargs. There is precedence for this in the FIPS module. In fact, we lift code from there to make sure we're API compatible. The goal of supporting a `boot` karg is to eliminate possible races for the `by-label/boot` symlink in the real root if multiple exist. Part of: coreos/fedora-coreos-tracker#976
This ensures that the rootfs will always mount the correct boot filesystem in the future (see previous patch). Part of: coreos/fedora-coreos-tracker#976
coreos/fedora-coreos-config#1256 changes the strategy now to add a With the stamp file idea above, it makes the relationship 1:1 by preventing any other rootfs from binding to the same bootfs. |
…artition Inspired by: coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
…artition Inspired by: coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
This binds the bootloader to the bootfs and the bootfs to the rootfs. Part of coreos/fedora-coreos-tracker#976.
Yup, sorry I should've used
Ahh OK, so we're talking about something that hasn't landed yet in GRUB.
Right yeah. It mostly matches, but strictly following the BLS means it should be FAT IIUC, which it isn't. Probably worth discussing this part in #1038 instead. |
OK, checklist for rolling this out:
Then I think we can close this. |
I'm not sure if it's been asked for by anyone until now? |
This binds the bootloader to the bootfs and the bootfs to the rootfs. Part of coreos/fedora-coreos-tracker#976.
Aborts firstboot when system has several filesystems labeled `boot` Fix for coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
Aborts firstboot when system has several filesystems labeled `boot` Fix for coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
This is done now! |
What's the first release of FCOS we can try this in? Can we add some docs? |
The one with coreos/fedora-coreos-config#1269, which is part of this week's
|
The fix for this went into |
I think this one went into stable |
The fix for this went into |
This is analogous to `systemd-fstab-generator` parsing `root=` kargs. There is a precedent for this in the FIPS module. In fact, we lift code from there to make sure we're API compatible. The goal of supporting a `boot` karg is to eliminate possible races for the `by-label/boot` symlink in the real root if multiple exist. Part of: coreos/fedora-coreos-tracker#976
This ensures that the rootfs will always mount the correct boot filesystem in the future (see previous patch). Part of: coreos/fedora-coreos-tracker#976
This binds the bootloader to the bootfs and the bootfs to the rootfs. Part of coreos/fedora-coreos-tracker#976.
Aborts firstboot when system has several filesystems labeled `boot` Fix for coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
This is analogous to `systemd-fstab-generator` parsing `root=` kargs. There is a precedent for this in the FIPS module. In fact, we lift code from there to make sure we're API compatible. The goal of supporting a `boot` karg is to eliminate possible races for the `by-label/boot` symlink in the real root if multiple exist. Part of: coreos/fedora-coreos-tracker#976
This ensures that the rootfs will always mount the correct boot filesystem in the future (see previous patch). Part of: coreos/fedora-coreos-tracker#976
This binds the bootloader to the bootfs and the bootfs to the rootfs. Part of coreos/fedora-coreos-tracker#976.
Aborts firstboot when system has several filesystems labeled `boot` Fix for coreos/fedora-coreos-tracker#976 Signed-off-by: Nikita Dubrovskii <[email protected]>
There's an old discussion on this I'm not finding but:
We've seen issues in the past with both FCOS (I believe) and RHCOS (for sure) where people had done an install on one disk (including a
/boot
partition) and then re-installed on a separate disk.Because grub picks
/boot
by label and the OS picks/boot
, we can end up racing/flapping between picking a/boot
partition on startup.We generate a new UUID for
/boot
on firstboot, but I think we should instead:coreos-installer
time/boot/grub.cfg
to use itboot=UUID=
into the kernel cmdlineThis is basically what Anaconda does.
The text was updated successfully, but these errors were encountered: