Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
test: apply mitigations in Linux 6.1 to make boot times fast
Linux 6.1 brought a guest boottime performance regression. * Cause There are two factors that cause this issue: 1. In the implementation of the mitigation for the iTLB multihit vulnerability, KVM creates a worker thread called kvm-nx-lpage-recovery. This thread is responsible for recovering huge pages split when the mitigation kicks-in. In the process of creating this thread, KVM calls `cgroup_attach_task_all()` to move it to the same cgroup used by the hypervisor thread 2. In kernel v4.4, upstream converted a cgroup per process read-write semaphore into a per-cpu read-write semaphore to allow to perform operations across multiple processes (commit 1ed1328792ff46e4bb86a3d7f7be2971f4549f6c). It was found that this conversion introduced high latency for write paths, which mainly includes moving tasks between cgroups. This was fixed in kernel v4.9 by commit 3942a9bd7b5842a924e99ee6ec1350b8006c94ec which chose to favor writers over readers since moving tasks between cgroups is a common operation for Android. However, In kernel 6.0, upstream decided to revert back again and favor readers over writers re-introducing the original behavior of the rw semaphore (commit 6a010a49b63ac8465851a79185d8deff966f8e1a). At the same time, this commit provided an option called favordynmods to favor writers over readers. Since the kvm-nx-lpage-recovery thread creation and its cgroup change is done in the KVM_CREATE_VM call, the high latency we observe in 6.1 is due to the upstream decision to favor readers over writers for this per-cpu rw semaphore. While the 4.14 and 5.10 kernels favor writers over readers. * Solution There's two solutions for this issue: 1. If the CPU is not vulnerable to iTLB multihit vulnerability, the best solution is to disable the mitigation with the newly added KVM option `nx_huge_pages=never`. This entirely avoids the situation and may also gain some additional nanoseconds in `KVM_CREATE_VM` since no threads will be created. Note that this is also the KVM upstream recommended solution ([here](https://lore.kernel.org/kvm/[email protected]/)) 2. If the CPU is vulnerable to iTLB multihit, then the mitigation can't be disabled. In this case, we have to use the `favordynmods` option. There are two cases: - AL2023 (cgroup v2): Just remount the cgroup mount point with: `sudo mount -oremount,favordynmods /sys/fs/cgroup` - **IMPORTANT**: The 6.1 kernel has an issue where `favordynmods` won't work when the cpuset cgroups is enabled. This is the case in our CI because we installs docker (which enables the cpuset cgroup by default). This is now fixed in 6.1.50. - AL2 (cgroup v1): cgroup v1 doesn't support changing mount flags during remount. Use a new option to enable favordynmods during boot (which works for cgroup v1 and v2) Signed-off-by: Pablo Barbáchano <[email protected]> Co-authored-by: Luiz Capitulino <[email protected]>
- Loading branch information