-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust API: Store
lacks a method for querying the size of values
#277
Comments
Thanks for the feedback @LDeakin! I assume you're interested in sharding with this. Are there other use cases? Icechunk has the ability to do sharding in a different way--by packing multiple chunks into the same object, but without Zarr really even knowing about it. This is also potentially more flexible, because the store can decide at runtime how to pack the chunks, or they can be repacked retroactively. I'm curious about the tradeoffs between this (currently unimplemented) approach to sharding and the current Zarr spec one. TBH I have never really understood the whole "sharding as a codec" concept. I think it makes sense for sharding to be an implementation detail of the store. As for chunk-level metadata like checkksum, with Icechunk we have the option of putting that in the chunk manifest rather than the chunk itself! This could be a lot more efficient to query. |
When I first scanned over Icechunk, I wondered how it would work with a shard written incrementally (chunk-by-chunk). But that sounds much better. Delegating sharding-like functionality to Icechunk could give history at chunk granularity, and array producers/consumers would not need to concern themselves with shards 👍.
Not currently. But, a Zarr store either needs to support reading from the end of a value or querying its size (or ideally both) to support partial decoding with all current Zarr V3 codecs. |
Size definitely can and should be implemented! It's already in the chunk manifest. |
@LDeakin what's the issue with |
That represents the 42nd byte onwards right? What I am after is the last 42 bytes, for example. I think I would need to know the size of the value to construct such a Note that many stores support requesting the last N bytes from an object. |
@LDeakin I'll change |
This provides an approach to deal with #277
Looks good! |
We have given Lachlan away around this, but I'll keep the ticket open until we offer a way to retrieve the size of a chunk using the |
@LDeakin we have released |
Sure did! |
Unbelievable @LDeakin ! |
Related conversation in zarr-python zarr-developers/zarr-python#2420 |
As far as I can see, retrieving trailing bytes (e.g. CRC32C checksum, shard index) from a chunk with
Store::get
orStore::get_partial_values
is not possible (with theByteRange
abstraction) without knowing the size of a value.The text was updated successfully, but these errors were encountered: