Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-380580: cross-pool migration - no CPU checks for halted VMs, move CPU check to the target host #6175

Conversation

last-genius
Copy link
Contributor

Rebased the work from 2023 merged in #5111 and #5132, that caused issues and was partially fixed in #5148, but was completely reverted in #5147. I've integrated the fix from #5148 and additionally the fix suggested by @minglumlu in CA-380715 that was not merged at the time due to time constraints.

This series passed the tests that were originally failing: sxm-unres (Job ID 4177739), vGPUSXMM60CrossPool (4177750), and also passed the Ring3 BST+BVT (209341). I can run more migration tests if needed - I've heard @Vincent-lau has requested for these to be separated into its own suite instead of being only in Core and Distribution regression tests.

This includes the current_domain_type field, which is important for live
imports, including those during a cross-pool live migration.

Signed-off-by: Rob Hoes <[email protected]>
The target host of a live migration is running by definition the same or
a newer version of the software compared to the source host. As CPU
checks are often extended or even changed in software updates, it is
best to perform such checks on the target host. However, currently it is
the source host that does these checks. This patch moves the check to
the target host.

A cross-pool live migration always begins with a dry-run VM-metadata
import on the target host. This is where checks for free memory and GPU
capacity are already carried out. The metadata import handler is
extended to accept a `check_cpu` query parameter to signal that a CPU
check is needed. This is included in the import call done in
`assert_can_migrate` on the source host, and the old CPU check in there
is dropped.

Source hosts without this patch will still perform the CPU checks
themselves, so we do not compromise safety.

NOTE: This is a rebase of the initial work from 07a2a71
that had to be reverted with Ming's suggestion to skip CPUID check
on 'unspecified' snapshots implemented.

Signed-off-by: Rob Hoes <[email protected]>
CPU checks are needed only for running VMs that are being migrated, to
check for compatibility with the remote-host's CPUs.

NOTE: This is the rebase of the initial work from 3d039f3
that had to be reverted with the fix from df7cbfd incorporated.

Signed-off-by: Rob Hoes <[email protected]>
@Vincent-lau
Copy link
Contributor

What has changed since then that makes the previous revert no longer needed?

@psafont
Copy link
Member

psafont commented Dec 12, 2024

That would be avoiding running the check on snapshots:

and additionally the fix suggested by @minglumlu in CA-380715 that was not merged at the time due to time constraints.

Do current cross-pool tests contemplate the case of migrating from yangtze to XS8?

@last-genius
Copy link
Contributor Author

Do current cross-pool tests contemplate the case of migrating from yangtze to XS8?

I don't see anything of the sort, but will ask the storage team

@last-genius
Copy link
Contributor Author

last-genius commented Dec 12, 2024

This passed the test migrating from yangtze (4178244), and some other SXM tests (4178243, 4178245)

UPD: an SXM stress test (4178242) also completed successfully.

Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find anything obvious, and the testing doesn't find any problems, either

@last-genius last-genius added this pull request to the merge queue Dec 13, 2024
Merged via the queue into xapi-project:master with commit 88534a0 Dec 13, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants