Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh cached ConfigFS data after deleting a backstore #171

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

iammattcoleman
Copy link
Contributor

@iammattcoleman iammattcoleman commented Apr 18, 2020

Overview

As described in Red Hat Bugzilla #1571728, some targetcli commands fail when the cached target configuration data becomes stale, producing errors such as...

  • ValueError: Name 'lun0' already used by a sibling.
  • This LUN does not exist in configFS

Changes from my previous pull request

  1. supports all fabrics (My first attempt at fixing this only worked for iSCSI targets.)
  2. attempts to refresh LUNs for the specific TPG, falling back to target-level LUNs if the fabric does not have TPGs, and falling back to the entire target if it doesn't contain LUNs (for safety - this block should never execute)
  3. if any errors occur while locating the cached config nodes to refresh, it falls back to refreshing the root node
  4. failed config node lookups will not crash the program

Steps to reproduce the error and verify the fix

The following links go to a gist with commands to reproduce the conditions, along with the output from running the command on my machine: the "before" output is what the current release will do; the "after" output is from the code in this pull request.

iscsi

loopback

vhost

all three (iscsi, loopback, vhost)

xen-pvscsi

I do not have a Xen machine to test on. Could someone run this command and post the before/after results?

targetcli <<EOF && rm /tmp/test1.img
ls
/backstores/fileio create test1 /tmp/test1.img 10M
/xen-pvscsi create naa.5001405ef6e5af1a
/xen-pvscsi/naa.5001405ef6e5af1a/tpg1/luns create /backstores/fileio/test1
ls
/backstores/fileio delete test1
ls
/xen-pvscsi delete naa.5001405ef6e5af1a
ls
EOF

@iammattcoleman iammattcoleman changed the title Refresh TPGs after deleting a backstore Refresh cached target data after deleting a backstore Apr 21, 2020
@iammattcoleman iammattcoleman changed the title Refresh cached target data after deleting a backstore Refresh cached target information after deleting a backstore Apr 21, 2020
@iammattcoleman iammattcoleman changed the title Refresh cached target information after deleting a backstore Refresh cached UI nodes after deleting a backstore Apr 21, 2020
@iammattcoleman iammattcoleman changed the title Refresh cached UI nodes after deleting a backstore Refresh UI nodes after deleting a backstore Apr 21, 2020
@iammattcoleman iammattcoleman force-pushed the fix-lun-does-not-exist-in-configfs branch 2 times, most recently from 62ea500 to 41f3108 Compare April 24, 2020 07:16
@iammattcoleman iammattcoleman changed the title Refresh UI nodes after deleting a backstore Refresh cached ConfigFS data after deleting a backstore Apr 24, 2020
@iammattcoleman iammattcoleman force-pushed the fix-lun-does-not-exist-in-configfs branch from 41f3108 to 0cf3b27 Compare April 24, 2020 07:17
@iammattcoleman
Copy link
Contributor Author

iammattcoleman commented Jun 18, 2020

@maurizio-lombardi On IRC, you mentioned some concerns about performance when the system has thousands of targets. Without this change, the daemon is only usable if you use batch mode to make all of your changes. Otherwise, you'll frequently see This LUN does not exist in configFS. If any processes attempt to parse targetcli output, they'll intermittently fail.

The daemon is appealing because it reduces targetcli execution time significantly, but our systems work fine without it. So, this change isn't urgent for me but I still think something (possibly this change) should be done to address the issue. The daemon currently seems like it was entirely created for @pkalever's application, and other use-cases haven't been considered.

@maurizio-lombardi
Copy link
Collaborator

@pkalever I think it's worth trying a test with 1000s backstores and targets and measure the performance impact, if any.

@iammattcoleman
Copy link
Contributor Author

@maurizio-lombardi @pkalever I'd also be interested in a test of this approach (only refreshing the items that need to be refreshed) versus @pkalever's PR #170 that refreshes everything.

If the performance difference is negligible on @pkalever's systems with 1000s of backstores, I think the easier to read code in #170 should be used.

@maurizio-lombardi
Copy link
Collaborator

maurizio-lombardi commented Jun 29, 2020

I did a simple test: I created 1000 iscsi targets and 2000 backstores (1000 of which unused)
then I measured the time it takes to delete the 1000 unused backstores.

Without the patch:
real 6m49.836s
user 2m9.891s
sys 0m16.323s

With the patch:
real 10m58.201s
user 2m7.529s
sys 0m16.418s

@iammattcoleman
Copy link
Contributor Author

Well, that's a huge impact. Can you try @pkalever's one-line change from #170?

@maurizio-lombardi
Copy link
Collaborator

maurizio-lombardi commented Jun 30, 2020

Well, that's a huge impact. Can you try @pkalever's one-line change from #170?

I tried, it's so slow that it makes little sense to wait until the test ends (it will take hours)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants