Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-402901: Update leaked dp to Sr #6169

Merged

Conversation

changlei-li
Copy link
Contributor

@changlei-li changlei-li commented Dec 11, 2024

When add leaked datapath:

  1. add leaked datapath to Sr.vdis
  2. write to db file
  3. log enhance

If there are storage exceptions raised when destroying datapath,
the procedure fails and the state of VDI becomes incorrect,
which leads to various abnormalresults in subsequent operations.
To handle this, the leaked datapath is designed to redestroy the
datapath and refresh the state before next storage operation via
function remove_datapaths_andthen_nolock. But this mechanism
doesn't take effect in current code.
This commit is to fix this bug. leaked datapath should be added
to Sr.vdis to make the leaked datapath really work. And write to
db file to avoid losing the leaked datapath if xapi restarts.

@changlei-li
Copy link
Contributor Author

Ring3: BST+BVT passed

@changlei-li changlei-li force-pushed the private/changleli/CA-402901 branch from 61aa9bb to 588448a Compare December 11, 2024 03:19
@changlei-li changlei-li force-pushed the private/changleli/CA-402901 branch from 588448a to 17e66ff Compare December 11, 2024 08:58
ocaml/xapi/storage_smapiv1_wrapper.ml Outdated Show resolved Hide resolved
ocaml/xapi/storage_smapiv1_wrapper.ml Outdated Show resolved Hide resolved
@@ -529,7 +541,8 @@ functor
)
with e ->
if not allow_leak then (
ignore (Vdi.add_leaked dp vdi_t) ;
Sr.add_or_replace vdi (Vdi.add_leaked dp vdi_t) sr_t ;
Copy link
Member

@minglumlu minglumlu Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to help myself to understand...
This is only for SMAPIv1.
There are two cases in which the destroy_datapath_nolock will be called:

  1. DP.destroy_sr: this happens when SR.detach or VBD.unplug.
  2. remove_datapaths_andthen_nolock: this happens before all VDI related operations.
    In the case 1, it is to deliberately remove the datapath. Now with the fix, if the removal failed due to exceptions from e.g. SM, the datapath will be recorded as leaked in both memory and file. So that in case 2, it could be identified and be removed again. Furthermore, the removal failure again in case 2 will fail the then VDI operation and eventually expose the error. I think this is expected as it would be bad to ignore the error which might cause more issues.
    After all, from a user's perspective,
    if the case 1 is for a VM.reboot or VM.shutdown, the failure would get the VM in halted state, and a VM.start would get the VM running back;
    if the case 1 is for a VM.pool_migrate or VDI.pool_migrate, the VM could be started on the destination again after the toolstack restart on the source host.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in VM.pool_migrate or VDI.pool_migrate case, the toolstack restart only remove the DPs used in migrations, would it remove the leaked DPs of the VM? The VM doesn't resident on the source host actually. But the leaked DPs still blocks the VM attach on the destination host.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for SMAPIv1.

On master (XS 8), yes, as the Storage_smapiv1_wrapper covers only SMAPIv1 SRs. Before, the module was called Storage_impl and covered both SMAPIv1 and SMAPIv3. The change was made following an update on how qemu-dp is used inside SMAPIv3 plugins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in VM.pool_migrate or VDI.pool_migrate case, the toolstack restart only remove the DPs used in migrations, would it remove the leaked DPs of the VM? The VM doesn't resident on the source host actually. But the leaked DPs still blocks the VM attach on the destination host.

If a dp was leaked during a VM.reboot, then it will now be properly recorded as such. A following VM.pool_migrate includes a VDI.deactivate call on the source host, where now the wrapper will notice the leaked dp and will get rid of it before proceeding. This happens before VDI.activate is called on the remote host.

When add leaked datapath:
1. add leaked datapath to Sr.vdis
2. write to db file
3. log enhance

If there are storage exceptions raised when destroying datapath,
the procedure fails and the state of VDI becomes incorrect,
which leads to various abnormalresults in subsequent operations.
To handle this, the leaked datapath is designed to redestroy the
datapath and refresh the state before next storage operation via
function remove_datapaths_andthen_nolock. But this mechanism
doesn't take effect in current code.
This commit is to fix this bug. Leaked datapath should be added
to Sr.vdis to make the leaked datapath really work. And write to
db file to avoid losing the leaked datapath if xapi restarts.

Signed-off-by: Changlei Li <[email protected]>
@changlei-li changlei-li force-pushed the private/changleli/CA-402901 branch from 17e66ff to 9ad4626 Compare December 11, 2024 09:45
@robhoes robhoes added this pull request to the merge queue Dec 11, 2024
Merged via the queue into xapi-project:master with commit 309e7f6 Dec 11, 2024
15 checks passed
@changlei-li changlei-li deleted the private/changleli/CA-402901 branch December 12, 2024 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants