-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snapshot_create / snapshot_revert default boot failure if newest installed kernel not in use #84
Comments
Not best practice, indeed! I'm leaning to just fail if not booted from the default kernel. We already have a check like this done for - name: Validate default kernel is booted
ansible.builtin.include_role:
name: initramfs
tasks_from: preflight
when: snapshot_create_boot_backup This will fail the play with message warning "Current kernel version ... is not the default version ..." Thoughts? |
Thanks for the help @swapdisk - that's a great suggestion. I've made the change in the attached pull request. Sample invocation:
|
Summary
Although not best practice, it's possible to install a new kernel package but not reboot into it. For example, if a maintenance window does not allow downtime.
When a new kernel is installed but not booted into, the
snapshot_create
role backs up the currently running version of kernel files from/boot
. It also backs upgrubenv
which typically references the newer kernel. This mismatch causes the default entry when runningsnapshot_revert
to be inconsistent and results in a default grub entry which fails with errorfile /vmlinuz-<old kernel> not found.
Additional Details / Steps to Reproduce
A RHEL 7.9 install with the default kernel:
Update the kernel but do not reboot the server:
Prior to performing an IPU or any change to the system, we run the
snapshot_create
role as follows:The following backup files are created:
Note the following about the backup contents:
/boot
files for the current running kernel (derived from ansible_kernel)/boot
files for the newer kernelgrubenv
file which has the newer kernelWe reboot the server to bring in the new kernel and upgrade it to RHEL 8 via leapp using the
infra.leapp
collection. (OR, to reproduce this issue you could likely remove the kernel files for RHEL 7 in/boot
)Post-upgrade,
/boot
no longer contains any RHEL 7 kernels.We roll-back the server using the
snapshot_revert
role.This uses the files from
boot-backup-ripu.tgz
to populate/boot
. As thegrubenv
file contains the newer kernel entry, the server tries to boot from it. However, the newer kernel files were not backed up and the boot fails with a message such as:file /vmlinuz-3.10.0-1160.119.1el7.x86_64 not found
Possible enhancements to improve role behaviour
Not fully sure what the expected behaviour should be for this use case. Should the create role stop when this scenario is detected, or if it was allowed what would we do on revert - default back to the older kernel or the newer one?
Perhaps one of these:
When appropriate, warn the user that "Newest installed kernel not in use" and they should reboot the server before running the
snapshot_create
role.When
snapshot_create
is run, ensure that kernel described ingrubenv
is also included in the backup.When
snapshot_revert
is run, modify thegrubenv
file so that the older kernel is selected as a default.The text was updated successfully, but these errors were encountered: