-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eLabFTW integration via Galaxy file source #19319
base: dev
Are you sure you want to change the base?
Conversation
The method `write_from()` of `SingleFileSource` and `BaseFilesSource` reads a local file from `native_path` and saves it to `target_path` on a file source. This commit allows the service backing the file source to choose which will be the path of the saved file, meaning that `target_path` and the actual path where the file can be recovered later do not have to match. The latter is the return value of `write_from()`. Therefore, all usages of `write_from()` have also been refactored to consider the paths chosen by the file source's backing service. In addition, when exporting a history, the URI that the service backing the file source assigns to it will be saved to the history export result metadata object.
Change the implementation so that the definitions of `_write_from` in classes that inherit from `BaseFilesSource` do not need to change.
Define a new class `FileSourceModelExportStore` that abstracts the common details of `BcoModelExportStore`, `ROCrateArchiveModelExportStore`, `TarModelExportStore` and `BagArchiveModelExportStore`. This new class manages exports to file sources, from where data can be retrieved later on via a URI. It takes the responsibility of creating a temporary directory to set up the file to export and uploading it to the file source.
The method `list()` from `galaxy.files.sources.BaseFilesSource` lists the directories and files within a file source. An optional keyword argument `recursive` (`False` by default) lets it recursively retrieve directories and files within a specific directory. This operation is very cheap in terms of CPU and expensive in IO terms, be it network or filesystem IO. Depending on how the underlying system is built, it may support retrieving directories and files recursively or not. If it does not, then every time a directory is listed, it is necessary to make another request to list each subdirectory. This may end up involving hundreds of requests. Done sequentially, this can be extremely slow, especially if each one involves network access. This commit makes the `list()` method asynchronous, which enables Galaxy to wait for the underlying system to complete the requests concurrently, resulting in a massive speedup. The price to pay is the extra complexity of using the async primitives. Since this change implies that all functions in the chain up to the API endpoints and the test functions must also be made asynchronous, this commit also takes care of it.
Eventually I can add automated tests, I would rather ask for your feedback first though. |
eLabFTW [1] revolves around the concepts of experiment [2] and resource [3]. Experiments and resources can have files attached to them. To get a quick overview, try out the live demo [4]. The scope of this implementation is exporting data from and importing data to eLabFTW as file attachments of already existing experiments and resources. Each user can configure their preferred eLabFTW instance entering its URL and an API Key. File sources reference files via a URI, while eLabFTW uses auto-incrementing positive integers. For more details read galaxyproject#18665 [5]. This leads to the need to declare a mapping between said identifiers and Galaxy URIs. Those take the form `elabftw://demo.elabftw.net/entity_type/entity_id/attachment_id`, where: - `entity_type` is either 'experiments' or 'resources' - `entity_id` is the id (an integer in string form) of an experiment or resource - `attachment_id` is the id (an integer in string form) of an attachment This implementation uses both `aiohttp` and the `requests` libraries as underlying mechanisms to communicate with eLabFTW via its REST API [6]. A significant limitation of the implementation is that, due to the fact that the API does not have an endpoint that can list attachments for several experiments and/or resources with a single request, when listing the root directory or an entity type _recursively_, a list of entities has to be fetched first, then to fetch the information on their attachments, a separate request has to be sent _for each one_ of them. The `aiohttp` library makes it bearable to recursively browse instances with up to ~500 experiments or resources with attachments by sending them concurrently, but ultimately solving the problem would require changes to the API from the eLabFTW side. References: - [1] https://www.elabftw.net/ - [2] https://doc.elabftw.net/user-guide.html#experiments - [3] https://doc.elabftw.net/user-guide.html#resources - [4] https://demo.elabftw.net - [5] galaxyproject#18665 - [6] https://doc.elabftw.net/api/v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in person, the implementation looks amazing! especially because of the care and attention to detail! Thank you very much!
I also think that the recursive
option that has motivated many of these changes (and increased complexity) is a use case that we should probably not support. Being able to recursively list or select a potentially huge number of experiments without strict pagination will likely be unusable or cause more trouble than it is worth... 😓
@@ -296,7 +299,7 @@ def get_uri_root(self) -> str: | |||
"""Return a prefix for the root (e.g. gxfiles://prefix/).""" | |||
|
|||
@abc.abstractmethod | |||
def list( | |||
async def list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't change this for existing interfaces. You can add a <operation>_async
method where needed. We can't do blocking IO in async calls, but every plugin apart from yours will make blocking calls.
Closes #18665, requires #19154 and #19256 (and also includes them, sorry; to review have a look at c00d250). Implements a file source to integrate Galaxy with eLabFTW.
eLabFTW revolves around the concepts of experiment and resource. Experiments and resources can have files attached to them. To get a quick overview, try out the live demo at demo.elabftw.net. The scope of this implementation is exporting data from and importing data to eLabFTW as file attachments of already existing experiments and resources. Each user can configure their preferred eLabFTW instance entering its URL and an API Key.
File sources reference files via a URI, while eLabFTW uses auto-incrementing positive integers. For more details read #18665. This leads to the need to declare a mapping between said identifiers and Galaxy URIs.
Those take the form
elabftw://demo.elabftw.net/entity_type/entity_id/attachment_id
, where:entity_type
is either 'experiments' or 'resources'entity_id
is the id (an integer in string form) of an experiment or resourceattachment_id
is the id (an integer in string form) of an attachmentThis implementation uses both
aiohttp
and therequests
libraries as underlying mechanisms to communicate with eLabFTW via its REST API. A significant limitation of the implementation is that, due to the fact that the API doesnot have an endpoint that can list attachments for several experiments and/or resources with a single request, when
listing the root directory or an entity type recursively, a list of entities has to be fetched first, then to fetch
the information on their attachments, a separate request has to be sent for each one of them. The
aiohttp
library makes it bearable to recursively browse instances with up to ~500 experiments or resources with attachments by sending themconcurrently, but ultimately solving the problem would require changes to the API from the eLabFTW side.
This is the third and last PR of a series of PRs that integrate eLabFTW with Galaxy via a file source (together they address issue #18665):
How to test the changes?
(Select all options that apply)
License