-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading remote zarrs via authenticated HTTP calls #9
Comments
…rrs via authenticated HTTP calls #9
Hi @tcompa, Thanks for the example. I have played a bit with it, and it seems that fully supporting It took only a few minor changes #10
Should produce:
The only part where support would require a bit more work is the tables. In concrete if I try:
I won't find the subgroups and sub-arrays.
I can see two strategies:
|
This is very encouraging! Without any deep knowledge of ngio from my side, it seems nice to be able to "inject" some fsspec native object without ngio knowing too much about fsspec itself. |
If you look at https://zarr.readthedocs.io/en/stable/_modules/zarr/hierarchy.html#Group.groups, you'll see that there is a different behavior for zarr v2 and v3.
Just to make sure: is this another instance of zarr-developers/zarr-python#1568? In that case, there is no obvious way out - see these quotes from that issue:
To rephrase it as a more concrete comment:
|
Thanks for the resources! I did not know about consolidated metadata, but it is a great way to group all metadata in a single place. We should have ngio calling consolidate every time we create a new element in the Zarr hierarchy. This would make large plate metadata parsing much more efficient. I think, for now, it's ok just to avoid relying on Zarr internals to discover groups and arrays. This logic will be heavily refactored when we switch to v3 anyway. I have only a small additional question: should ngio be agnostic to auth?
|
In my opinion, at first I would stick with option 1 (ngio knows nothing about authentication, but it can use an arbitrary fsspec store). The complex part of option 2, in my view, would be the following: To put the question in a broader context: where will it be relevant for ngio to use specific fsspec objects (e.g. the HTTPFileSystem one)? This question is independent on the specific case of auth-related additional parameters, as there could exist different configuration parameters. Relevant use cases:
Understanding these use cases better would help you decide whether it's relevant for ngio to integrate the creation of fsspec objects. |
The main goal of this explorative issue is to read remote zarrs over HTTP, when this HTTP calls require some authentication/authorization. I would postpone thinking about supporting write operations, especially because I cannot say whether it's a relevant use case (would someone really operate over HTTP, apart from the use case of reading existing datasets?)
The simplest example I can come up with is inspired e.g. on zarr-developers/zarr-python#1568, zarr-developers/zarr-python#993, pangeo-forge/pangeo-forge-recipes#222 (and the pangeo one is an interesting view into more integrated use cases of globus).
Starting from fsspec.implementations.http.HTTPFileSystem, we can include a
client_kwargs
argument which is then passed to the underlying aiohttp.ClientSession calls. An example from thefsspec
docs isTo use
HTTPFileSystem
for a zarr array either via zarr-python or dask.array, we can proceed as inwith output
Given such minimal example, the question is whether this could fit anywhere in ngio. To phrase it differently: is it relevant/worth for ngio to integrate fsspec? I do not know ngio well enough for answering.
Next steps, in my understanding:
The text was updated successfully, but these errors were encountered: