Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: block incompatible kernel from being installed #16139

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

tleydxdy
Copy link
Contributor

Motivation and Context

The current "Requires" lines only ensure the old kernel is available on the system but it does not prevent fedora from updating to an incompatible and breaking user's system.

Description

Set Conflicts to block incompatible kernels from being installed.

How Has This Been Tested?

Waiting for build test

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

@rincebrain
Copy link
Contributor

From my recollection of the last time this was discussed, since the kernel packages are considered special/required, this just results in dnf resolving the conflict by uninstalling the ZFS packages.

@tleydxdy
Copy link
Contributor Author

Interesting, I must have missed that discussion. I tried to search and read-up on this issue :(
I remember there was a guide or a issue post that mention user shoud pin zfs by
echo 'zfs' > /etc/dnf/protected.d/zfs.conf

@tleydxdy
Copy link
Contributor Author

Is #15188 what you are referring to?

I think it is useful to spell-out what is the expected way to consume zfs-dkms on ""rolling"" distros where kernel upgrades to incompatible versions on a routine bases. Below is what I gathered reading various issues.

The current situation:

  • User with auto-upgrade: system breaks every couple month
    • Workaround: manually pin kernel to the current version, check github regularly and re-pin once zfs updates support
    • This means part of the benefit of auto-upgrade is negated, and if user didn't act quick enough fedora might have upgraded to yet another incompatible kernel and user will be stuck with kernel that is 2+ cycles older
  • User with manual upgrade: every upgrade requires manual intervention (at minimum check that kernel is not in the upgrade list)
    • Workaround: same as auto-upgrade
    • Also same as auto-upgrade, if upgrade interval was unlucky user could be stuck with a really old kernel

The ideal in my opinion:

  • User with auto-upgrade: system keeps working
    • This means kernel is kept back only as old as needed, and once zfs is updated kernel gets updated as well
  • User with manual upgrade: user can upgrade without fear
    • If user upgrades rarely then it could be stuck with an kernel older than necessary, but this seems unsolvable without using an archive of old kernel packages

My understanding is with conflict and setting zfs as protected, it achieves this goal with the caveat that dnf will print a warning when upgrading. is that correct?

@rincebrain
Copy link
Contributor

rincebrain commented Apr 28, 2024

My (personal, not any kind of larger group or org) general advice is not to use it on a rolling distro that bumps kernel revs, really, precisely because it's going to break like this.

I don't actually know if that was the discussion, it's come up a few times, but my understanding is that there's not really a good way to express what's wanted in dnf settings.

I guess you might be able to do something weird like a postscript or conf file that adds a hold or analogue on the kernel range that's acceptable, maybe?

@tleydxdy
Copy link
Contributor Author

I'm fairly sure protected is the way to handle it, here's from dnf docs:

protected_packages
List of packages that DNF should never completely remove. They are protected via Obsoletes as well as user/plugin removals.

The default is: dnf, glob:/etc/yum/protected.d/.conf and glob:/etc/dnf/protected.d/.conf. So any packages which should be protected can do so by including a file in /etc/dnf/protected.d with their package name in it.

DNF will protect also the package corresponding to the running version of the kernel. See also protect_running_kernel option.

I do agree with the points made in the other issue thread though. I don't think setting zfs protected should be done automatically by the package install script if only for the reason that it blocks user uninstall also. So I think a sentence should be added to the fedora install guide to tell user why they need to set this.

@rincebrain
Copy link
Contributor

I don't really agree with setting it either, though, is the thing, I think.

That's just going to provoke lots of angry messages about how it broke their upgrade process entirely, and half of them followed by they undid it and it broke, IMO. That's not really a better experience, because it's more frustration, and still broken at the end with what they're likely to do without looking up what they should do.

Much as I like to encourage people to read documentation, the reality is that people aren't going to go check it if something breaks, they're going to do what seems to make sense to them to get past this complaint, and that's going to break them the same way as before, but with even higher frustration b/c they've spent more time on it.

@tleydxdy
Copy link
Contributor Author

I'm not sure that is the case? Currently the situation is broken for both people that follow instructions and people that do not, and with this change it makes it so people that follow the instructions can have a better experience and people that don't is not worse off anyway. right?

@rincebrain
Copy link
Contributor

My argument is that people who follow instructions, it will still be broken the same way, because the ideal outcome is that it upgrades everything else possible but stops the kernel from upgrading past the allowed version, right?

If you mark the ZFS packages as protected, if I understand the behavior right, it'll refuse to upgrade if there would be a conflict that would require removing one of (kernel|zfs), and if they unprotect it, it'll break just like if they had never done that dance, yes? (To say nothing of if you unprotect the kernel packages, which I'm not sure it'll even allow you to do...)

And if you don't, it'll break the same way it currently does.

So my claim is based on the idea that "break all upgrades" is not a net gain, because people are just going to end up at the same end state, plus some frustration to undo the thing that stopped them doing the upgrade in the first place, and there's not really any outcome they can have from that state other than "wait for a newer version to make that not a problem" that's better.

If I'm misunderstanding something here, or you disagree with my conclusions, I'd love to hear how I'm wrong or not viewing this the same way - I don't have a particularly strong opinion on this, just that if I understand the situation right, this would just break the "rolling" in "rolling upgrades" entirely if the situation comes up, which I claim is just going to result in more frustration and people still breaking their systems, absent a better solution.

@tleydxdy
Copy link
Contributor Author

Do you know how I can download the build artifact from buildbot? It might be valuable to have some practical examples.

@behlendorf behlendorf added Component: Packaging custom packages Status: Code Review Needed Ready for review and testing labels Sep 23, 2024
The current "Requires" lines only ensure the old kernel is available on the system but it does not prevent fedora from updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Signed-off-by: tleydxdy <[email protected]>
@tleydxdy
Copy link
Contributor Author

tleydxdy commented Nov 12, 2024

rebasing this since my system is broken again by autoupdate, I really don't see the argument against setting the "conflict" field.

If the user does not protect zfs:

  • with conflict setting:
    • upgrade will remove zfs, next boot into a system without zfs mounts
  • without conflict setting:
    • upgrade into broken zfs, best case scenario zfs does not build so it's the same as above, worst case scenario zpool gets corrupted. after a few kernel upgrades zfs gets removed anyway because the version in required is rotated out

If the user does set zfs to protected:

  • with conflict setting
    • auto update works without any trouble, old kernel is held as needed and once zfs package updates kernel gets updated automatically
  • without conflict setting
    • upgrade into broken zfs, best case scenario zfs does not build so it's the same as above, worst case scenario zpool gets corrupted.

In every combination, the conflict setting results in a safer and more up-to-date result.

Also the openzfs docs is quite misleading by suggesting that protect zfs will block kernel upgrade, this is not true, it will only block the old kernel to get rotated out
openzfs/openzfs-docs@4e955b7

@behlendorf
Copy link
Contributor

I tend to agree, adding "conflicts" seems to me like an improvement across the board for all the scenarios. In the unprotected w/conflicts case having the zfs packages be removed may be confusing, but it does have the advantage that dnf should report the zfs packages will be removed as part of the upgrade. Making this more prominent during seems preferable to me compared to having a broken package installed.

I'm good with making this change. We'll just want to keep an eye out for any new corner cases or unexpected side effects.

@jkool702
Copy link

I use zfs on root on Fedora.

Ive found the best way to deal with this situation is to just continue to boot the last zfs-supported kernel until the next zfs version is released. This gives the situation where zfs keeps working bug
free and dnf update keeps working without needing to explicitly exclude kernel updates.

Note that dnf will never, under any circumstance, attempt to uninstall the currently running kernel. So, for example, right now I am booting a 6.10.x kernel, but have 2 6.11.x kernels installed. When dnf installs a kernel update it will uninstall the older of the two 6.11.x. It wont try to uninstall the 6.10.x kernel since it is running, and it wont try to uninstall zfs since there is still a compatible (6.10.x) kernel installed.

I feel like this is, by far, the best way to deal with the situation - nothing breaks, nothing uninstalled, and you can continue to use dnf update without adding -x kernel* every time. That said, I realize that this hinges on users choosing to boot the older zfs-supported kernel.

Perhaps an alternate solution would be to add a check in the makefile for the kernel module so that dkms builds fail on unsupported kernels and give a obvious "impossible to miss" error message that instructs them to boot ______ kernel to be able to use zfs? This prevents accidental unknowing corruption from using an unsupported kernel (zfs is useless without its kmod), gives them a good path to "wait it out" on an older kernel, and if they choose to not take that path (and not have zfs at all) for 3 kernel updates in a row then having dnf auto-uninstall zfs seems reasonable.

@tonyhutter
Copy link
Contributor

There's a third "gray zone" case that actually comes up fairly often on Fedora. It happens when we do a release supporting the X kernel, but it just so happens that we've pulled in enough patches into the release to also build under the X+1 kernel as well. For example, zfs-2.2.6 supports the 6.10 kernel, but I believe it will also build on 6.11. It's just that we didn't explicitly test it on the 6.11 kernel at the time of release. I think this is actually the status quo with zfs-2.2.6 Fedora 40 right now. It buys us wiggle room when the OS is aggressively updating it's kernel and we don't want ZFS to artifically short-circuit the build, only because the kernel is untested with ZFS. There has only been one time that I can think of where ZFS built correctly for a newer "unsupported" kernel, but didn't work correctly with it.

We would lose this "grey zone" behavior if we pulled in this PR. We would spend more time updating the META file and putting out new releases, even if the code itself hasn't changed.

So it's a tradeoff - you either keep the "gray zone" behavior and have longer-lived releases on Fedora (which will make users less annoyed), or do the correct thing and pull in this PR to uninstall ZFS on newer, "unsupported" kernels which ZFS would most likely work perfectly fine on.

@jkool702
Copy link

I can confirm that on Fedora 40 the zfs kmod for 6.11 builds wiithout error and seemingly works properly (though I havent tested it beyond being able to boot into my system, which uses zfs on root).

Although things seem to work ok on kernel 6.11 with zfs 2.6, I noticed that during boot I get a few warnings + call trace printed to screen that are related to zfs. I almost always get these whenever Im using a kernel that is 1 step higher than the max zfs supports (i.e., this isnt unique to zfs 2.6 + kernel 6.11). I'll add the warnings / call traces I see with 6.11 + zfs 2.6 in a file attached to this comment (to avoid needlessly adding 240 lines to this conversation).

Im not sure if this indicates an actual problem (perhaps a very minor one), or if this is just a superfluous warning that can be ignored. It makes me nervous enough though to boot into the older still-supported 6.10 kernel while I wait for an update to bring offical support.

@tleydxdy
Copy link
Contributor Author

tleydxdy commented Nov 15, 2024

I can see the argument of not having the conflict field for development package/version, since otherwise it would be pretty annoying to test new kernels. but I don't think having the released version support untested kernels is a good idea.

@tonyhutter
Copy link
Contributor

tonyhutter commented Nov 15, 2024

I can see the argument of not having the conflict field for development package/version, since otherwise it would be pretty annoying to test new kernels. but I don't think having the released version support untested kernels is a good idea.

Just to put it in context, Fedora 40 alone has run the 6.8, 6.9, 6.10, and 6.11 kernel, and it's not EOL yet:

$ yum --showduplicates list kernel
Last metadata expiration check: 0:00:32 ago on Fri 15 Nov 2024 10:17:27 AM PST.
Installed Packages
kernel.x86_64                    6.9.9-200.fc40                     @updates
kernel.x86_64                    6.10.3-200.fc40                    @updates
kernel.x86_64                    6.11.4-201.fc40                    @updates
Available Packages
kernel.x86_64                    6.8.5-301.fc40                     fedora 

Now imagine dnf uninstalling ZFS after each of those upgrades... not a great user experience. So do we do what's academically the right thing do to, or do we do the thing that's going to annoy users the least? I lean towards the latter, especially considering Fedora users are going to more risk tolerant than, say, RHEL users.

@behlendorf
Copy link
Contributor

behlendorf commented Nov 15, 2024

For example, zfs-2.2.6 supports the 6.10 kernel, but I believe it will also build on 6.11. It's just that we didn't explicitly test it on the 6.11.... We would lose this "grey zone" behavior if we pulled in this PR. We would spend more time updating the META file and putting out new releases, even if the code itself hasn't changed.

As things currently stand we're already going to lose this behavior in the 2.3.0 releases. Commit 410287f, which is not in 2.2.6, updates the build system to enforce the maximum kernel version. To retain the existing behavior we would need to add --enable-linux-experimental to the configure options in the dkms build. That's not something we're going to want to set in the production packages, but do want to be able to opt in (probably with the dkms.conf file) to enable testing of new kernels. Adding the "Conflicts" to the production packages would at least ensure a user never attempts to build something which is guaranteed to fail.

@tleydxdy
Copy link
Contributor Author

I can see the argument of not having the conflict field for development package/version, since otherwise it would be pretty annoying to test new kernels. but I don't think having the released version support untested kernels is a good idea.

Just to put it in context, Fedora 40 alone has run the 6.8, 6.9, 6.10, and 6.11 kernel, and it's not EOL yet:

$ yum --showduplicates list kernel
Last metadata expiration check: 0:00:32 ago on Fri 15 Nov 2024 10:17:27 AM PST.
Installed Packages
kernel.x86_64                    6.9.9-200.fc40                     @updates
kernel.x86_64                    6.10.3-200.fc40                    @updates
kernel.x86_64                    6.11.4-201.fc40                    @updates
Available Packages
kernel.x86_64                    6.8.5-301.fc40                     fedora 

Now imagine dnf uninstalling ZFS after each of those upgrades... not a great user experience. So do we do what's academically the right thing do to, or do we do the thing that's going to annoy users the least? I lean towards the latter, especially considering Fedora users are going to more risk tolerant than, say, RHEL users.

maybe you missed the part with protect zfs? the best case is when there's the conflict field and user choose to set zfs as protected

@tonyhutter
Copy link
Contributor

As things currently stand we're already going to lose this behavior in the 2.3.0 releases.

Ah that's true, I was only thinking of the 2.2.x branch. Looks like with 2.3.x we'll have to be more aggressive with updating META and putting out Fedora releases.

I'll approve it - but can you please update your commit message first to make checkstyle happy:

error: commit message body contains line over 72 characters

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 20, 2024
@behlendorf behlendorf merged commit d02257c into openzfs:master Nov 20, 2024
17 of 18 checks passed
@behlendorf
Copy link
Contributor

behlendorf commented Nov 20, 2024

Commit comment fixed when merging.

@tleydxdy
Copy link
Contributor Author

oh, whoops, I got carried away with work and didn't check github. Sorry about that

@tleydxdy tleydxdy deleted the patch-1 branch November 21, 2024 04:23
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 21, 2024
The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: tleydxdy <[email protected]>
Closes openzfs#16139
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 21, 2024
The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: tleydxdy <[email protected]>
Closes openzfs#16139
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Packaging custom packages Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants