-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubeVirt on OKD makes SELinux throw "Context system_u:object_r:kubelet_exec_t:s0 is not valid (left unmapped)" #1285
Comments
Context: I could upgrade an all-VM test cluster without problems, on a production setup all my baremetal workers have this problem after upgrade to the latest 4.10 (end June 2022). VM masters of same cluster are not affected, they just boot fine. |
Do you have any additional packages installed / repos enabled? Looks like some Fedora update broken SELinux rules |
No, I did not install anything extra manual although the cluster is running virtualization and nmstate, don't know if that might impact fedora rpms.. On the first host I fiddled a bit with rpm-ostree: kernels, pivoting etc. But the other 2 hosts were 'cleanly' upgraded from 2 versions older to the newest 4.10 at the end of June. |
This was the output from the fcos upgrade logged by MCD:
|
Also experiencing this after the upgrade on bare metal nodes, first 4.10 release was the original install I believe. dmesg output:
No custom packages here, but I have played with a custom SELinux policy for logging. |
It seems containers/container-selinux#178 is related here, as we did
during upgrade @rhatdan any suggestions how to get more info " SELinux: Context system_u:object_r:kubelet_exec_t:s0 is not valid (left unmapped)."? @msteenhu could you upload a must-gather somewhere (GDrive or similar) so that we'd get more info about rpm-ostree status? |
https://maarten.gent/must-gather.tar.xz ClusterID: a6f553f9-b175-4f3e-9dee-c8cf33b57704 |
Assuming my hosts are still suffering the SELinux problem (did not try reboot yet), running 'restorecon' is the fix? On kubelet? Or hyperkube? Guess the latter. I'll see if I can experiment a bit later today. I guess I can always reinstall a host from scratch if my experiment breaks the host even further. Got this from the mentioned container-selinux issue: |
Another fix is disabling SELinux (permissive mode): https://docs.okd.io/4.10/nodes/nodes/nodes-nodes-working.html#nodes-nodes-kernel-arguments_nodes-nodes-working Not really my preference but better than running hosts that do not survive reboot (longterm).. |
So the virtualization operator is to blame I guess? At least in my case. I certainly did not fiddle with SELinux myself. |
Yes, that looks very familiar. I heard a new coreos Version will address this, not sure when it will come to okd. Nevertheless this means the original problem, being non-mergable policy updates, remains for now, which is a bummer. |
Seems to be the same problem! To fix we could use what is specified here? Steps:
Is this the right way to restore default policy? |
yes it is. As mentioned, this is supposed to be improved by upstream FCOS, if it has happened or not and when this version comes to OKD I don't know. It seems kind of urgent though, because this is the second time within weeks that this causes downtime for users. |
Problem is I did not do any fixes myself. I believe the KubeVirt operator did them in my case. I have very little experience with SELinux so I do not yet understand how to fix it. If I read the suggestions correctly I should restore SELinux, then somehow finish the OSTree upgrade and then reinstall the virtualization operator? |
just follow the rsync step above so that |
@sandrobonazzola could you have a look at that? Seems kubevirt related |
@vrutkovs I'll loop kubevirt people in |
Looping in @xpivarc |
KubeVirt does indeed install a custom policy at runtime. for reference: https://github.com/kubevirt/kubevirt/blob/main/cmd/virt-handler/virt_launcher.cil However, it's not clear to me how this is related to the kubelet_exec_t error being observed. Can you please clarify how they're connected? |
Simple:
This is supposed to be fixed or already was fixed by CoreOS is my understanding, but it is not clear to me when it will actually arrive. CoreOS must provide a clean way to allow custom policies without completely deactivating SELinux or going through cumbersome rpm-ostree packing and install procedures. Does that make sense? EDIT: To explain my last point clearer: What CoreOC roughly needs to do is when an update happens:
|
Thank you for the clear and colorful explanation. So KubeVirt serves as a source of entropy for the global policy file. @msteenhu, KubeVirt will install the SELinux module it requires on each worker node as part of its startup procedure. Thus the workaround that you and @markusdd spoke of should indeed work. |
the caviat is that semodule -B is not exactly a quick process. Also what I wrote above only re-applies custom modules. If any custom sebooleans were set these would be gone. So just overriding the policy file and then recompiling is not something that should be done automatically. It always boils down to the same issue: We need that upstream fix. KubeVirt could maybe work around the problem 'correctly' by providing an rpm-ostree package for your module, which would be detected properly. |
semudule -B should not been needed unless you have run a semodule -DB previously. Installing a module needs to be done once and does the equivalence of a semodule -B. |
Yeah but as we just explained it does not happen only once if we need to workaround.Am 21.07.2022 11:08 schrieb Daniel J Walsh ***@***.***>:
semudule -B should not been needed unless you have run a semodule -DB previously.
Installing a module needs to be done once and does the equivalence of a semodule -B.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Sorry for the delay, but I can confirm that the workaround is as simple as executing these 2 commands:
Does anybody know if the newest versions still suffer from this SELinux/FHCOS bug? |
See coreos/fedora-coreos-tracker#701 for FCOS tracking The RHCOS/OCP tracker is: https://bugzilla.redhat.com/show_bug.cgi?id=2057497 |
I also see this when using Openshift Virtualization. |
@vrutkovs This is all journal logging around the first (failing) start of kubelet after upgrade. Should I supply a zip with all the journal logging or something else to help? I can even help by doing an upgrade again (after reinstalling a host with an older 4.10 version or something like that).
Originally posted by @msteenhu in #1270 (comment)
The text was updated successfully, but these errors were encountered: