Skip to content
Peter Baumann edited this page Jan 1, 2024 · 8 revisions

Data Requests Wiki

Contents: Finding data | Current Limitations | How-to | Use Cases

Finding data

Listing contents

Services support a direct listing, however not necessarily with the convenience of the planned catalog:

  • rasdaman datacubes: get list of datacubes (requires authentication) - beware: this is an OGC-compliant XML document, search for element "CoverageSummary"
  • EOX: (tbd)

Catalog

An easy way to browse datasets available is the catalog. Note that it is still under development and catching up with the datasets available.

Limitations

  • Time stamps have a peculiar mechanics on several datacubes which is not yet supported by rasdaman. Therefore, the time axis for now has ben modelled as an index (Cartesian) axis, meaning that temporal access (such as with the TIME parameter in WMS 1.3) is not yet possible. Full temporal support will become available still within 2023.
  • Due to minor misalignments of the OGC standards some facets of the XML schemas do not validate. However, with most tools this is not an issue when using data.

How-to

In this section we give a brief introduction to datacube wrangling. First, terminology: In standardization world, datacubes are modeled by "coverages". Most relevant are the OGC Coverage Implementation Schema (CIS) as the data model and Web Coverage Service (WCS) as the processing model, containing the Web Coverage Processing Service (WCPS) datacube analytics language. So don't be surprised to see "coverages" mentioned below.

We first present a general overview on standards-based datacube access, and then provide some use-case specific examples. If you want to see further examples added, contact us!

Coverages

Coverages are designed to be self-describing. While always more metadata can be added to some object, the coverage contains the essentials for understanding the pixels. The canonical structure of a coverage consists of

  • domain set: where can I find values?
  • range set: the values.
  • range type: what do the values mean?
  • metadata: what else should I know about these data?

Coverages can be encoded in a variety of data formats. Text formats include XML, JSON, and RDF; binary formats include GeoTIFF, NetCDF, and Grib2.

See this tutorial for more details on CIS and these Fairicube encoding examples.

Coverage Access

The Web Coverage Service (WCS), in its current version 2.1, defines access in a user-selected encoding, spatio-temporal subsetting, scaling, reprojection, as well as processing (see next section). Such Web requests are expressed as http GET or POST requests as this example (using fairicube rasdaman) shows (whitespace only for an easier read, not part of the request):

https://fairicube.rasdaman.com/rasdaman/ows
    ? SERVICE=WCS & VERSION=2.1.0 & REQUEST=GetCoverage
 		& SUBSET=date( "2018-05-22" ) 
 		& SUBSET=E( 332796 : 380817 )
 		& SUBSET=N( 6029000 : 6055000 )
    & FORMAT=image/png

As per OGC syntax, date/time strings need to be quoted.

Note that http requires certain characters to be "URL-encoded" before submission; browsers often do that automatically, but not programmatically generated requests.

See this tutorial for more details on WCS.

Coverage Processing

WCPS allows processing, aggregation, fusion, and more on datacubes with a high-level, easy-to-use language which does not require any programming skills like python. The following example inspects coverage A and returns a cutout with a range extent expressed in Easting and Northing (assuming this is the native coordinate reference system of the coverage) and a slice at a time point, returned in PNG format:

for $c in ( A )
return
    encode( $c [ date( "2018-05-22" ), ( 332796 : 380817 ), N( 6029000 : 6055000 ) ], "png" )

Such a query can be sent through the WCS Processing request:

https://fairicube.rasdaman.com/rasdaman/ows
    ? SERVICE=WCS & VERSION=2.1.0 & REQUEST=ProcessCoverages
    & QUERY=for $c in ( A ) return encode( $c [	date( "2018-05-22" ), ( 332796 : 380817 ), N( 6029000 : 6055000 ) ], "png" )

Again, be reminded that "http URL-encoding" needs to be applied before sending.

So far, each coverage has been processed in isolation. Data fusion is possible through “nested loops”:

for $a in ( A ), $b in ( B )
return encode( $a + $b, "png" )

Aggregation plays an important role for reducing the amount of data transported to the client. With the common aggregation operators – in WCPS called “condensers” – queries like the following are possible (note that no format encoding is needed, numbers are returned in ASCII):

for $a in ( A )
return max( $a )

As a final example, the following WCPS query com¬putes the Inverted Red-Edge Chlorophyll Index (IRECI) on a selected space / time region, performs contrast reduction for visualization, and delivers the result reprojected to EPSG:4326:

for	$c in (S2_L2A_32633_B07_60m),
  	$d in (S2_L2A_32633_B04_60m),
  	$e in (S2_L2A_32633_B05_60m),
  	$f in (S2_L2A_32633_B06_60m)
let $sub := [ date("2018-05-22"), E(332796:380817), N(6029000:6055000) ]
return
 	encode(
 		crsTransform(
 			( $c - $d ) / ( $e / $f ) [ $sub ],
 			{ E: " EPSG:4326", N: “EPSG:4326” }
 		) / 50,  
 		"png"
 	)

See this tutorial for more details on WCPS.

User-Defined Functions (UDF)

A rasdaman UDF is external code stored on the rasdaman server machine, invokable from within WCPS or rasql queries. During query evaluation, the rasdaman server dynamically loads and invokes the external code, passes the query parameters indicated, and receives back the result. This way, the external code is seamlessly integrated in the query orchestration.

For python, which is most relevant in the project, see the detailed documentation on python UDFs.

Use Cases

ML Use Case

tbd

Drosophila Use Case

Genome Data

Corresponding data request issue: Genomic data of Drosophila

Occurrence Cube

Corresponding data request issue: Distribution data of Drosophila. GBIF data are described in this issue. Our sister project B-Cubed, with GBIF as partner, will provide selected data.

Datacube Structure:

  • Domain dimensions:
    • Lat, Long: - Extent (RD - EPSG:28992): Xmin: 168280, Xmax: 223880, Ymin: 512055, Ymax: 535555 - Extent (lat/lon): Xmin: 5.5831989141242966, Xmax: 6.4086515407429623, Ymin: 52.5917375949509562, Ymax: 52.8070852699905871 - Resolution: 10m
    • Time (year): 2018
    • Taxon: 7-digit taxon id, categorial
  • Range type:
    • Count: float, no-data: -1
    • Maximum Uncertainty: float, no-data: -1
  • Metadata:
  • input format: CSV with columns Year, EEA Grid Cell, TaxonID, Count, Uncertainty
    • EEA reference grid cell identifiers, e.g. 1kmE5432N4321 or 250mE1025N22000
    • In contrast to the datasets to date, GBIF provides Lat/Long through a grid cell id for the LAEA 10m grid

EOX-based use cases

Clone this wiki locally