-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ostree-finalize-staged.service times out on slow hardware #1824
Comments
Do you have substantial data in |
Not really?
Most of that appears to come from |
Wow. yeah, Any chance you have more of the logs from one of those failed runs? I'm interested to see log messages with timestamps to see if we can tell how far it did make it before the timeout. |
Sure:
Nearly 2 minutes to "refresh SELinux policy" but then I don't know what's happening after "bootfs is sufficient" for another 2 minutes. |
Some of the things we've discussed which I think would help a lot here are:
That said, 1 is relevant only when you have a custom SELinux policy. Is that the case here? You can check e.g. |
Yes, I have a custom module I have to load to get collectd to work. |
Is there a reason that couldn't be done when the update is staged (pre-shutdown) versus next boot?
hmm. would this have implications for opportunistic nature of ostreedev/ostree#2847 |
The policy is defined in
The link is that we would only be able to do this pre-copy on systems with enough space in their bootfs because we don't want to pre-emptively delete a deployment at staging time.
Ahhh, gotcha. So yeah, this part is contributing to making finalization more expensive. Anyway, I think your current workaround is the best for now but we should definitely try to improve this. |
Describe the bug
On certain slow systems, such as Raspberry Pis, the ostree-finalize-staged.service unit fails often because it takes "too long" to do its work. This often results in reboot loops, where Zincati continuously tries to apply the update, but this fails, so the machine reboots back into the old version.
Reproduction steps
This is consistently reproducible for me on my "regular" Raspberry Pi 4b devices, with "generic" class 10 SD cards. I do not notice it on my CM4 devices that use eMMC.
Expected behavior
I would expect updates to succeed, regardless of how long they take.
Actual behavior
Especially when an upgrade requires "pruning" a previous version of FCOS, updates fail to apply and machines get stuck in a reboot loop. Manually running
rpm-ostree cleanup -r
usually resolves it.Increasing the timeout with a unit drop-in configuration file also resolves the issue, e.g.
System details
metal
: Raspberry Pi 4bButane or Ignition config
No response
Additional information
The
ostree-finalized-staged.service
unit file has this snip:I guess 5 minutes is probably not "quite long" enough?
The text was updated successfully, but these errors were encountered: