Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Store.getsize #2426
Added Store.getsize #2426
Changes from 15 commits
5e0ffe8
1926e19
12963ab
384d323
c39e03c
87d2a9e
8ba85ec
1cdfd6d
7cbc500
ade17d2
7231d7c
81c4b7e
ce548e2
4350e53
5f1d036
a688296
783cfe3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we offer implementers the following in documentation?:
This function will be called by zarr using a
prefix
that is the path of a group, an array, or the root. Implementations can choose to do undefined behavior when that is not the case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure... I was hoping we could somehow ensure that we don't call it with anything other than a group / array / root path, but users can directly use
Store.getsize_prefix
and they can do whatever.LMK if you want any more specific guidance on what to do (e.g. raise a
ValueError
). I'm hesitant about trying to force required exceptions into an ABC / interface.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now I'm noticing that I've done exactly that in
getsize
, with requiring implementations to raiseFileNotFoundError
if the key isn't found :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This materializes the full list of keys in memory, can we maintain the generator longer to avoid that?
Also, this has unlimited concurrency, for a potentially very large number of keys. It could easily create millions of async tasks. We should probably run in chunks limited by the value of the
concurrency
setting.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See
concurrent_map
for an exampleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't immediately see how that's possible.
The best I'm coming up with is a fold-like function that asynchronously iterates through keys from
list_prefix
and (asynchronously) callsself.getsize
to update the size. Sounds kinda complicated.FWIW, it looks like
concurrent_map
wants aniterable
of items:In 7cbc500 I've hacked in some support for AsyncIterable there. I haven't had enough coffee to figure out what the flow of
is. I'm a bit worried the
async for item in items
is happening immediately, so we end up building that list of keys in memory anyway.Fixed. We should probably replace all instances of
asyncio.gather
with a concurrency-limited version. I'll make a separate issue for that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5f1d036 removed support for
AsyncIterable
inconcurrent_map
, replacing it with a TODO.I think there's some discussion around improving our use of asyncio to handle cases like this (using queues to mediate task producers like list_prefix and consumers like
getsize
) that will address this.The unbounded concurrency issue you raised, is still fixed. It's just the loading of keys into memory that's not yet addressed.