-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Geo)Zarr sub-resources in /coverage
, and CIS JSON
#175
Comments
How Zarr is internally organized should not be important. We should use the common parametres ot extract information from the coverage that are agnostic of the file format. I'm not able to understand why the first sentence is related with the rest of the discussion and the resucitation of "/coverage/" In case that we request a zarr as a format, can we only use a MIME multipart response or a zip files (with or without compression) to support the retrieval of several files in one? Both support files folders and byte arrays. |
@joanma747 In general, I think multipart responses are a pain to deal with. Zip could be one option (though we would need a Zarr+zip media type), but I also think the whole idea of Zarr being cloud-friendly is normally mapping it to separate files (resources) that allow to efficiently serve it from object storage and access individual parts of it (somewhat similar to COG range request). This might also be necessary to access it with e.g., Python XArray. Although the standard itself seems to say that the key/value store (file name / file content) can be implemented any which way, from https://wiki.earthdata.nasa.gov/display/ESO/Zarr+Format:
So based on all of this, I am wondering if it should mean that our (Geo)Zarr conformance class in coverages is not a single file unlike other representations like GeoTIFF or netCDF, but individual files (as defined by Zarr) inside And I am making the parallel to CIS JSON if we already introduce encoding-specific sub-resources about the desire to access only the "domainset" or "rangetype" property that we had before that currently I changed to a (of course it could get messy in terms of describing those encoding-specific sub-resources in an OpenAPI definition, especially if there ends up being conflicting paths for different encodings). |
I see.
If you want to implement Zarr as "files in a folder" then you should define the OGC Zarr API (that is not the OGC API Coverages) /collections/{collectionId}/Zarr/file I have the same opinion with COG. We should not serve a COG in a OGC API Coverages. COG uses HTTP range and the client is in control of the bytes traffic. It does not require an API and adding an API only messes with the original idea. |
@joanma747 About COG see #93 (comment) :) For COG, it's rather easy to add support for HTTP range to a For Zarr as files in a folder, I am not convinced one way or the other. If this is what could allow pointing a Python XArray client to an OGC API - Coverages |
The whole purpose of the OGC API coverages is to forget about the internal structure of the coverage and request the data based on geospatial filters and other filters. If I have to build a client that transforms all that in to byte positions before I do a HTTP -range request, where is the value of the OGC API Coverages. It is simply better to forget about it and consider that HTTP-range is your protocol. No API needed. |
@joanma747 The value / sense in that is being able to support both typical OGC API - Coverages clients that implement parameters like It would be the same idea for supporting a Zarr directory to which you can point a python XArray client at |
@joanma747 I am not sure having separate APIs for specific encodings/formats is a good idea. For example, a CDB data store could use a UNIX file system (traditional CDB approach) in which content is structured in a hierarchy of file folders (based on tiling/LoD rules). A variety of formats/encodings are used and more will be added. As a developer defining clients that access a CDB data store, I want to simply have one API that accesses any vector data stype and one API that accesses any coverage type. Actually, I would really like one API that rules them all :-) so that I can simple say "Give me content in this geography area with appropriate metadata so I can then process further". Unfortunately, the OGC API design/architecture does not support this! |
|
@jerstlouis, @joanma747 , @cnreediii The key issue should be not the detailed format and structure in the cloud or on disks, but the metadata that is exposed for processing - this should be the same and consistent across OGC APIs . PS NetCDF3 and NetCDF4/HDF5 have completely different internal structures - the first is a multidimensional array, the latter a hierarchy of objects (which may be multidimensional arrays). The OGC APIs should hide this. |
What is the mechanism to know the internal file structure in the first place? Without knowing it, you cannot request individual chunk files by name. And then the client should be "aware" of this structure and what "chunk" corresponds to what spatial area. The Zarr structure should not change when you do a subsetting (a new Zarr should not be created) so in practice, when requesting individual subfiles you would not use subsetting or scaling parameters. |
@joanma747 Indeed, that is the question! Back in the day when I helped design and implement a GIS API, we had a simple call to the server that asked what formats/encodings the server supported for output. We used a controlled vocabulary so that the server provided a list of one or more formats and then the client "knew" what could be returned. Sort of sounds like some of the W*S Standards :-) Whether the returned format/encoding conformed to the rules of a given format, such as IEGS or SIF, that was a different question :-) |
SWG 2024-01-24: Since we do not have enough feedback / experience on how best to implement Zarr as a coverage representation in OGC API - Coverages, and uncertainty about the usefulness of either internal or (multi-web resources) space partitioning blocks potentially conflicting or being redundant with subsetting mechanisms, I propose to remove the Zarr requirement class from Part 1, with the option to add it later if we receive more feedback. Alternatively, if anyone has input on Zarr and would like to propose how best to go about it, please discuss this in this issue or at the upcoming Code Sprint February 13-15, https://github.com/opengeospatial/ogcapi-coverages/wiki/February-2024-OGC-API-%E2%80%90-Coverages-Virtual-Code-Sprint . |
SWG 2024-08-21: We now have a draft Zarr requirement class using a Zip container files for the Please provide feedback on this approach if you plan on implementing Zarr in your Coverages implementation. |
Zarr is typically organized in the cloud as a directory structure of resource files, as opposed to a single file.
What does this mean for implementing this as a representation of
/coverage
?Would the content then be sub-directories of
/coverage
?What does an
application/x-zarr
negotiated media type typically return?Does this sub-resource pattern mean that we should resurrect sub-resources in the context of CIS JSON, to require in addition to
/coverage
:/coverage/domainset
,/coverage/rangetype
,/coverage/rangeset
,/coverage/metadata
instead of the currentprofile=
query parameter in that requirement class, but ONLY for when CIS-JSON support is declared in/conformance
?The text was updated successfully, but these errors were encountered: