-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-380580: cross-pool migration - no CPU checks for halted VMs, move CPU check to the target host #6175
CA-380580: cross-pool migration - no CPU checks for halted VMs, move CPU check to the target host #6175
Conversation
This includes the current_domain_type field, which is important for live imports, including those during a cross-pool live migration. Signed-off-by: Rob Hoes <[email protected]>
Signed-off-by: Rob Hoes <[email protected]>
The target host of a live migration is running by definition the same or a newer version of the software compared to the source host. As CPU checks are often extended or even changed in software updates, it is best to perform such checks on the target host. However, currently it is the source host that does these checks. This patch moves the check to the target host. A cross-pool live migration always begins with a dry-run VM-metadata import on the target host. This is where checks for free memory and GPU capacity are already carried out. The metadata import handler is extended to accept a `check_cpu` query parameter to signal that a CPU check is needed. This is included in the import call done in `assert_can_migrate` on the source host, and the old CPU check in there is dropped. Source hosts without this patch will still perform the CPU checks themselves, so we do not compromise safety. NOTE: This is a rebase of the initial work from 07a2a71 that had to be reverted with Ming's suggestion to skip CPUID check on 'unspecified' snapshots implemented. Signed-off-by: Rob Hoes <[email protected]>
CPU checks are needed only for running VMs that are being migrated, to check for compatibility with the remote-host's CPUs. NOTE: This is the rebase of the initial work from 3d039f3 that had to be reverted with the fix from df7cbfd incorporated. Signed-off-by: Rob Hoes <[email protected]>
What has changed since then that makes the previous revert no longer needed? |
That would be avoiding running the check on snapshots:
Do current cross-pool tests contemplate the case of migrating from yangtze to XS8? |
I don't see anything of the sort, but will ask the storage team |
This passed the test migrating from yangtze (4178244), and some other SXM tests (4178243, 4178245) UPD: an SXM stress test (4178242) also completed successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find anything obvious, and the testing doesn't find any problems, either
Rebased the work from 2023 merged in #5111 and #5132, that caused issues and was partially fixed in #5148, but was completely reverted in #5147. I've integrated the fix from #5148 and additionally the fix suggested by @minglumlu in CA-380715 that was not merged at the time due to time constraints.
This series passed the tests that were originally failing: sxm-unres (Job ID 4177739), vGPUSXMM60CrossPool (4177750), and also passed the Ring3 BST+BVT (209341). I can run more migration tests if needed - I've heard @Vincent-lau has requested for these to be separated into its own suite instead of being only in Core and Distribution regression tests.