-
-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object-store
-based Store implementation
#1661
base: main
Are you sure you want to change the base?
Conversation
Amazing @kylebarron! I'll spend some time playing with this today. |
With roeap/object-store-python#9 it should be possible to fetch multiple ranges within a file concurrently with range coalescing (using That PR also adds a |
src/zarr/v3/store/object_store.py
Outdated
async def get_partial_values( | ||
self, key_ranges: List[Tuple[str, Tuple[int, int]]] | ||
) -> List[bytes]: | ||
# TODO: use rust-based concurrency inside object-store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I did it in rfsspec: https://github.com/martindurant/rfsspec/blob/main/src/lib.rs#L141
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object-store has a built-in function for this: get_ranges
. With the caveat that it only manages multiple ranges in a single file.
get_ranges also automatically handles request merging for nearby ranges in a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I know, but mine already did the whole thing, so I am showing how I did that.
Great work @kylebarron! |
I suggest we see whether it makes any improvements first, so it's author's choice for now. |
While @rabernat has seen some impressive perf improvements in some settings when making many requests with Rust's tokio runtime, which would possibly also trickle down to a Python binding, the biggest advantage I see is improved ease of use in installation. A common hurdle I've seen is handling dependency management, especially around boto3, aioboto3, etc dependencies. Versions need to be compatible at runtime with any other libraries the user also has in their environment. And Python doesn't allow multiple versions of the same dependency at the same time in one environment. With a Python library wrapping a statically-linked Rust binary, you can remove all Python dependencies and remove this class of hardship. The underlying Rust object-store crate is stable and under open governance via the Apache Arrow project. We'll just have to wait on some discussion in object-store-python for exactly where that should live. I don't have an opinion myself on where this should live, but it should be on the order of 100 lines of code wherever it is (unless the v3 store api changes dramatically) |
👍
I want to keep an open mind about what the core stores provided by Zarr-Python are. My current thinking is that we should just do a |
This is no longer an issue, s3fs has much more relaxed deps than it used to. Furthermore, it's very likely to be already part of an installation environment. |
I agree with that. I think it is beneficial to keep the number of dependencies of core zarr-python small. But, I am open for discussion.
Sure! That is certainly useful. |
This is awesome work, thank you all!!! |
Co-authored-by: Deepak Cherian <[email protected]>
The I'd like to update this PR soonish to use that library instead. |
If the zarr group prefers object-store-rs, we can move it into the zarr-developers org, if you like. I would like to be involved in developing it, particularly if it can grow more explicit fsspec compatible functionality. |
I have a few questions because the
I like that |
This came up in the discussion at https://github.com/zarr-developers/zarr-python/pull/2426/files/5e0ffe80d039d9261517d96ce87220ce8d48e4f2#diff-bb6bb03f87fe9491ef78156256160d798369749b4b35c06d4f275425bdb6c4ad. By default, it's passed as Does it look compatible with what you need? |
There's probably a bug or two in It's particularly nice that we're able to match the |
I rewrote the implementation of The type hinting in this PR passes. I'd love to get someone excited about trying this out. I figure this may not be desired for merge into |
Is there an obstore release with all of your latest changes? |
You can use https://pypi.org/project/obstore/0.3.0b2/ for now |
object-store
-based Store implementationobject-store
-based Store implementation
Run Store tests on ObjectStore
I'm not familiar enough with hatch to know what's going wrong with the CI |
https://github.com/zarr-developers/zarr-python/actions/runs/11931641418/job/33291263276?pr=1661, which failed with
seems to be because there isn't a wheel for 3.13 at https://pypi.org/project/obstore/0.3.0b5/#files. I think that https://github.com/zarr-developers/zarr-python/actions/runs/11931641418/job/33291259870?pr=1661 has just our required dependencies. Your test file will need something like a |
I updated this to obstore 0.3.0-beta.8, which includes wheels for Python 3.13 (except for Windows, which only has up through 3.12) |
@kylebarron - thanks for the continued work on this. We discussed this PR in our dev call today. Here's a brief summary of the outcome:
How does this sound to you? |
# Yield this item if "/" does not exist after the prefix. | ||
if "/" not in item["path"][prefix_len:]: | ||
yield item["path"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was brought to my attention in developmentseed/obstore#101 that in the current object_store
implementation a final /
is stripped from directories. So this _transform_list_dir
will include the names of any sub directories.
Does zarr expect zero-byte files in normal usage? Or we could assume that a zero-byte file may be a directory.
I agree it should be an optional dependency. Happy to document it as experimental. Also happy for users to construct I think we should get some benchmarks before deciding how to further integrate obstore. |
A Zarr store based on
obstore
, which is a Python library that uses the Rustobject_store
crate under the hood.object-store is a rust crate for interoperating with remote object stores like S3, GCS, Azure, etc. See the highlights section of its docs.
obstore
maps async Rust functions to async Python functions, and is able to streamGET
andLIST
requests, which all make it a good candidate for use with the Zarr v3 Store protocol.You should be able to test this branch with the latest pre-release version of
obstore
:TODO: