Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Library's "Import from User Directory" interface's display is very slow for full directories #19209

Open
vladvisan opened this issue Nov 26, 2024 · 2 comments
Assignees

Comments

@vladvisan
Copy link

Description of the problem

(Tested on 23.2.2.dev0)

I'm referring to this interface:

Before showing anything to the user, the page loads the whole folder structure recursively (it doesn't unfold it automatically, but it does load it)
Which, in our case (large sub-directory structure with many files, AND network-mounted ) leads to:

  • the whole Galaxy instance being unresponsive for a few minutes, for all users
  • eventually aborting because we reach "MAX_WALK_DIRS" (10K by default)

Relevant issues

Workaround
As a workaround, I did 2 things:

  1. Lower "MAX_WALK_DIRS" drastically, so that at least the abort happens sooner so that other users are less impacted
  • Maybe if the user closes his tab it would solve it, I haven't tested that. But either way, we can't assume they will do that.
  1. Switch to the "Upload" interface, which, as you can see, only loads one depth level at a time, which is a lot more efficient:
  • image
  • image
  • image

Remaining problems after the workaround

  • Even the Upload interface still takes a long time for folders with alot of files directly (as opposed to in sub-folders)
  • The Upload interface does not support linking files instead of copying

Potential solutions

  • re-use the Upload tool's interface in the Data Library's "Import from User Directory" interface?
    • At least reduces the timeout pb to numerous files one-level, from numerous files recursively (much more common)
  • modify the "recursive" option here? https://github.com/galaxyproject/galaxy/blob/release_23.2/lib/galaxy/managers/remote_files.py#L90
    • which calls -> lib/galaxy/files/sources/init.py / list -> /lib/galaxy/files/sources/posix.py / _list -> lib/galaxy/util/path/init.py / safe_walk
    • NB: the file has been modified since the 23.2 I am referencing (since it's my tested version)
  • start with a call to the OS to get the amount of files (potentially recursively)
    • to be able to abort pre-emptively instead of trying for nothing
    • to be able to display a progress bar
  • max amount of RAM per user instead of affecting other users
@itisAliRH
Copy link
Member

  • the whole Galaxy instance being unresponsive for a few minutes, for all users

Most performance issues have been fixed here #18638.

@vladvisan
Copy link
Author

Thank you all for taking a look at this issue.

I'm currently updating my Galaxy instance so I can test #18638 and #19132

Will update this week or next with the results (upgrade taking a while because I'm fully migrating to Ansible at the same time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants