-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contributing Golang and Rust implementations #5
Comments
I am total novice with Rust, but have been looking for a project to teach myself on. Since I know something about the python implementation (although my main experience in this are is with pythonic file-systems), this might be the perfect entry point. As for need, I simply don't know - you would look for other science or other big-array workflows in the Rust world. Where some have been speaking of C/C++ implementations of the zarr spec, there is certainly an argument to trying to get the same performance, but with more modern code, via Rust. |
@martindurant Great! I'm thinking I can use my Rust serialization experience to start the project, then would you be interested in helping to maintain it? |
There is a rust implementation for n5 already: It should be easy to port this to zarr, given how close the specs are. |
Thanks @constantinpape. Let’s see if @aschampion has thoughts. 🙂 |
I don't want to promise too much, given how much I (don't) know...
There is a chance of merging the libraries or spec eventually, right? (z5, whatever) |
@constantinpape I'm not an expert yet in Zarr nor N5, but after reading that code base, I think there are a lot of opportunities to improve its usage of the heap. The way it is written leads me to believe that it is not designed to minimize heap allocations. |
Yes, hopefully zarr spec v3 will merge zarr and n5. (I couldn't make the call today so I am not quite up-to-date on the progress regarding this).
That's probably something @aschampion could comment more on. |
cc @j6k4m8 who might be interested in a Go implementation |
Hi @rw, just to say thanks for proposing this, it would be very cool to have implementations in these languages. The current version of the underlying specification is version 2. We are just getting started on work towards a version 3.0 of the core protocol, which will hopefully provide a common implementation target for both the zarr and n5 communities. Current vision for the 3.0 core protocol is that it will be quite minimal and so may be a slightly easier implementation target than the current version 2 spec. In any case, you'd have a choice about whether to target the v2 or v3 spec. It would be nice if you could target the v3 spec while it's in development, as that would give us some early feedback on implementation experiences and pain points. However it may take some time to fully flesh out the spec, and there may be some to-and-fro on some decision points, so it would be a moving target to a certain extent. So if you'd rather target v2 initially (or a subset of v2) to get some interoperability with existing implementations and data then I'd perfectly understand. |
There are opportunities for optimization. As is it's at least a bit faster than the Java reference implementation, which it started as a rather direct translation of I had a branch which reduced some allocations with The kludge for composing block reads/writes from ndarrays is a mess of needless allocations, but that's effectively an outer loop around block IO. I don't have an immediate need or time to develop zarr support myself, but would happily accept PRs, including major restructuring, so long as the downstream wasm and conda-less pip installable python packages are still possible. But I also understand if you'd rather to start from scratch. |
@alimanfoo I'm happy to start with the V3 spec, and give feedback as I go. Would you be able to fill in the TODOs in the v3 spec with your best guesses, so that I have a concrete description upon which to work? Or, perhaps, provide links to the relevant parts of the v2 spec to fill in the gaps? |
@alimanfoo <https://github.com/alimanfoo> I'm happy to start with the V3
spec, and give feedback as I go. Would you be able to fill in the TODOs in
the v3 spec with your best guesses, so that I have a concrete description
upon which to work?
Sure, working on it, hoping to have a complete sketch by the end of the
week. Please bear in mind it will be just a straw man for the moment, will
need some time for others to digest and comment. Follow along here for
progress and discussion:
zarr-developers/zarr-specs#16
|
@alimanfoo Hmm, should I just start with v2? I can still offer feedback on the v3 spec. |
That would be fine I'm sure, lots of value in having something that would interoperate with current Zarr implementations. |
Hi, I would be interested in a rust native version of zarr. I've created a native HDF5 reader and streamer for rust, hidefix, in order to be able to concurrently and simultaneously read HDF5 files for a rust OPeNDAP server dars. The reader performs quite well, especially for concurrent reads. In some ways it works in similar ways to zarr as far as I can see, by creating an index of the hdf5 file. For it to work with the DAP server I implemented zero-copy serialization/deserialization of the index (cannot keep indexes of datasets in memory), originally using flatbuffers, but gave up on that in favor of bincode because of performance and alignment issues, think it should work though. It would be very interesting to support zarr in this server, but a must for this is concurrent reads, otherwise performance will be very poor. HDF5 is starting async work upstream, but it will be a very big job to make that fast and correct. It seems this should be possible to do more safely with zarr + rust. |
Thanks for re-raising this conversation, @gauteh. I recently had a conversation with @clbarnes in a conference slack (i.e. I'm considering it largely public):
see: https://github.com/aschampion/rust-n5 cc: @aschampion |
To be specific, we made a prototype implementation of zarr v3 in rust about a year ago, and now that the spec seems to be stabilizing and that we have the time, just this week picked it up again. When the crate is available I'll link it here. |
Great! Will this implementation support concurrent/parallel reads? Is the development version available somewhere already? |
If it's available I would be happy to test as well while working on the Python impl. |
The implementation is thread safe like the current rust-n5 crate, but I'm guessing based on how you're asking I should clarify: our Zarr implementation exposes a minimal API that attempts to somewhat faithfully match the Zarr spec for doing low-level chunk-based operations. This means the expectation is one builds concurrent/parallel access in libraries on top of that, rather than it being done implicitly. For example pyn5, built on top of rust-n5, threads chunk-wise access if requested. So in that sense the Zarr implementation itself doesn't do any parallel reads, because it is at a layer below parallelism. The filesystem implementation itself does N5-style advisory file locking, although because of Zarr's KV-store approach that can't prevent some data races N5 can, but file locking is only an extra safety anyway, not a concurrency coordinator. We are also making an interface crate on top of the Zarr crate that provides more h5py-like, ergonomic, rust-idiomatic use of the zarr rust backend with Since there's interest in poking at the Zarr impl and it can now read hierarchies output from zarrita, I made the repo public here. Caveats:
|
Community call suggestionThe subject of Rust and particularly rust-n5 came up during the 2021-01-27 community call. @aschampion @pattonw clbarnes, would any of you be free/interested in joining the next call, a week from today Wed. 10.02 at 2000 CET? If that's not an ideal slot, would you care to suggest another? cc @WardF @DennisHeimbigner in case there are any points of discussion re: libnczarr. |
Sure, I can join the call next week. |
That sounds great, this would certainly keep it general enough that we can build something async on top. |
Definitely -- |
Ran across a Go implementation recently. Opened issue ( #50 ) with more details. |
Hi there!
Via @ryan-williams, I'm posting here because I'm interested in making two ports of Zarr: to native Golang and to native Rust. (To me, "native" means "no FFI to an existing Zarr library".)
For background:
I wrote and maintain the official Golang, Python, Rust ports of Google's FlatBuffers serialization library. I received a Google Open Source Contributors Award for my volunteer efforts on FlatBuffers.
Here are some relevant links:
All of that is to say, that I have a background in writing high-performance serialization code in open-source projects.
So, is there a need from the community for Golang and/or Rust ports? I'm happy to spearhead/lead those initiatives, if so.
I'm interested in getting involved with Zarr, because I like both your technical solutions, as well as the community-friendly group dynamics that I've seen.
Best,
Robert
The text was updated successfully, but these errors were encountered: