Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using glob() in Artifactory backend #132

Open
hagenw opened this issue May 15, 2023 · 7 comments
Open

Avoid using glob() in Artifactory backend #132

hagenw opened this issue May 15, 2023 · 7 comments

Comments

@hagenw
Copy link
Member

hagenw commented May 15, 2023

For the implementtion of audbackend.Artifactory.ls() we switched to use glob('*/**') in the dev branch, which is very slow. In main we avoided using glob() in ls(), which we should also do in dev.

@frankenjoe
Copy link
Collaborator

frankenjoe commented May 15, 2023

Yes, I hope there is a good way to do so. Cause in the old implementation we were only returning the files in the folder, but since we decided that the concept of folders must not exist on a backend, we now return all files that have a backend path that starts with the requested sub-path.

@frankenjoe
Copy link
Collaborator

frankenjoe commented May 15, 2023

Ok, seems we should simply do:

backend = audbackend.access(
    'artifactory',
    'https://audeering.jfrog.io/artifactory',
    'data-public',
)

for p in backend._repo:
    print(p)

It returns all files in the repository and is really fast.

@frankenjoe
Copy link
Collaborator

frankenjoe commented May 15, 2023

Unfortunately, it seems the solution I proposed is not working with anonymous access at the moment. They might change it in the future, though:

https://jfrog.atlassian.net/browse/RTFACT-7384

For the record, this is the commit where I replaced glob() with simply listing everything in the repository:

5f2ccc5

Under the hood it does the following:

import audbackend


backend = audbackend.access(
    'artifactory',
    'https://audeering.jfrog.io/artifactory',
    'data-public',
)
backend._use_legacy_file_structure()

query = [
    'items.find',
    {
        'repo': {'$eq': 'data-public'},
        'type': 'file',
    },
    '.include',
    ['name'],
]
backend._repo.aql(*query)
[{'name': '005d2b91-5317-0c80-d602-6d55f0323f8c-1.1.0.zip'}, {'name': '014f82d8-3491-fd00-7397-c3b2ac3b2875-1.1.0.zip'}, {'name': '0241da63-dc03-ccb3-623c-80020b913b0d-1.1.0.zip'}, {'name': '02ae4ee2-063a-b720-897e-12400ea54f89-1.1.0.zip'}, {'name': '03ac9ce7-461e-9991-b925-89c2697d7f38-1.1.0.zip'}, {'name': '040df0a2-bdbf-6590-f86a-947f94ae7e12-1.1.0.zip'}, {'name': '0415d83a-9d8e-2cc6-86b0-2b0649c4e67d-1.1.0.zip'},`...]

But returns 403 for an anonymous user (even if the repository allows anonymous access).

@hagenw
Copy link
Member Author

hagenw commented May 22, 2023

As can be seen in audeering/audb@1e5a0cf the following works:

backend = audbackend.access(
    'artifactory',
    'https://audeering.jfrog.io/artifactory',
    'data-public',
)

for p in backend._repo.path:
    print(p)

@frankenjoe
Copy link
Collaborator

It's not recursive.

@hagenw
Copy link
Member Author

hagenw commented May 22, 2023

You can do this recursive by looping as long as it finds paths. But maybe that is as slow as glob() then?

@hagenw
Copy link
Member Author

hagenw commented Sep 19, 2023

It's also mentioned at devopshq/artifactory#423 that glob() is slow, especially compared to the underlying API call. So maybe we could also fix it by fixing the original code inside dohq-artifactory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants