Skip to content

Commit

Permalink
Merge branch 'main' into HEAD
Browse files Browse the repository at this point in the history
  • Loading branch information
baloola committed Jun 13, 2024
2 parents 23b56a4 + b7d6ca8 commit 9792fe8
Show file tree
Hide file tree
Showing 14 changed files with 137 additions and 1,213 deletions.
7 changes: 5 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@ on:
pull_request:
branches:
- main

push:
branches:
- '**'
- '!master'
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

Expand All @@ -20,4 +23,4 @@ jobs:
- name: validate stac items

run: |
pytest --verbosity=1 ./stac/stac-generator/test/validator.py
pytest --tb=no ./stac/stac-generator/test/validator.py
10 changes: 6 additions & 4 deletions CoverageEncoding/rangeType.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ For CIS encodings, the following requirements classes are of relevance:
- Basic Types and Simple Components Schemas: `XML Schema elements and types defined in the “basic_types.xsd” and “simple_components.xsd” schema files implement all classes defined respectively in the “Basic Types” and “Simple Components” UML packages.`

From the Basic Types and Simple Components, we rely on the Quantity and Category Elements.
While the Count Element would also be applicable, for the moment we will handle Counts as Quantities.
While the Count Element would also be applicable, for the moment we will handle Counts as Quantities with a uom of "1" for "unitless".

## SWE:DataRecord
SWE:DataRecord, derived from the SWE Common AbstractDataComponent, can be used to group multiple components via the `field` attribute.
Expand Down Expand Up @@ -72,7 +72,8 @@ Note: UCUM 1.8 has been deprecated, current version is [UCUM 2.1](https://ucum.o

The following shows an example of a Quantity rangeType taken from the Demography dataset. Note that ideally we would use the swe:Count type for this purpose.

```<cis11:RangeType>
```
<cis11:RangeType>
<swe:DataRecord>
<swe:field name="Population_total">
<swe:Quantity definition="https://ec.europa.eu/eurostat/web/gisco/geodata/population-distribution/geostat">
Expand All @@ -83,7 +84,7 @@ The following shows an example of a Quantity rangeType taken from the Demography
<swe:nilValue reason="">65535</swe:nilValue>
</swe:NilValues>
</swe:nilValues>
<swe:uom code="{tot}"/>
<swe:uom code="1"/>
</swe:Quantity>
</swe:field>
</swe:DataRecord>
Expand All @@ -99,7 +100,8 @@ When working from a data request, the `Category List` field in the Bands section

The following shows an example of a Category rangeType taken from the DominantLeafType dataset

```<cis11:RangeType>
```
<cis11:RangeType>
<swe:DataRecord>
<swe:field name="DominantLeafType">
<swe:Category definition="https://land.copernicus.eu/en/products/high-resolution-layer-dominant-leaf-type">
Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,16 @@ Contents of this space:

- [How to Get Data Added](https://github.com/FAIRiCUBE/data-requests/wiki/How-to-Add-Data)
- [Choosing the right pixel type](https://github.com/FAIRiCUBE/data-requests/wiki/Choosing-the-Right-Pixel-Type)
- [Details on the Coverage range type, as inherited from SWE Common](https://github.com/FAIRiCUBE/data-requests/blob/main/CoverageEncoding/rangeType.md)
- [Connecting catalog with datacubes](https://github.com/FAIRiCUBE/data-requests/wiki/Connection-Catalog-Datacubes)
- [Finding data ingested, datacube access how-to](https://github.com/FAIRiCUBE/data-requests/wiki)
- [Use case specific modeling and access](https://github.com/FAIRiCUBE/data-requests/wiki/Data-Overview)
- [complete rasdaman documentation](https://doc.rasdaman.com)

As data ingest is tightly connected with metadata management, use of data, etc., consider also these related spaces:

- [metadata-editor WebGUI](https://catalog-editor.eoxhub.fairicube.eu/): to provide and edit metadata to be shown in the [data catalog (STAC-fastapi)](https://catalog.eoxhub.fairicube.eu/?.language=en)

- [resource-metadata](https://github.com/FAIRiCUBE/resource-metadata): in addition to the issues providing metadata for resources, also used to discuss technical details on resource metadata
- [Fairicube Hub](https://github.com/FAIRiCUBE/FAIRiCUBE-Hub-issue-tracker): for general FAIRiCUBE topics

Expand Down
1 change: 0 additions & 1 deletion stac/stac-generator/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
pystac
pytest
gql
shapely
stactools
requests-toolbelt
149 changes: 123 additions & 26 deletions stac/stac-generator/test/validator.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,137 @@
import pystac
import pytest
import os
import json

from typing import Any


def validate_item(item: pystac.item.Item):
item_is_EDC: bool = False
# for now exempt edc items from the inventory required fields
for link in item.links:
if link.rel == "about" and link.target.startswith("https://collections.eurodatacube.com"):
item_is_EDC = True
break

properties: dict[str, Any] = item.properties

if not item_is_EDC:
# validate Data Source
assert "dataSource" in properties.keys(), "No dataSource in the stac item"
assert isinstance(properties["dataSource"], str), "dataSource must be a string"
assert len(properties["dataSource"]) > 0, "dataSource string must not be empty"

# validate Owner/Organisation
assert "providers" in properties, "No dataSource in the stac item"
assert isinstance(properties["providers"], list), "providers must be a list"
for provider in properties["providers"]:
assert "organization" in provider.keys() or "name" in provider.keys()
if "organization" in provider.keys():
assert isinstance(provider["organization"], str), "provider's organization must be a string"
assert len(provider["organization"]) > 0, " provider's organization must not be empty"
if "name" in provider.keys():
assert isinstance(provider["name"], str), "provider name must be a string"
assert len(provider["name"]) > 0, "provider name string must not be empty"

# validate Horizontal section
assert isinstance(item.bbox, list), "bbox must be a list"
assert len(item.bbox) == 4, "bbox must contain a 4 coordinates"
assert isinstance(item.geometry, dict),"geometry must be an object"

# Resolution of Horizontal Axis
assert isinstance(item.properties["cube:dimensions"], dict), "No dimensions in the stac item"
assert "x" in item.properties["cube:dimensions"].keys(), "No x dimension in the stac item"
assert "y" in item.properties["cube:dimensions"].keys(), "No y dimension in the stac item"
x = item.properties["cube:dimensions"]["x"]
y = item.properties["cube:dimensions"]["y"]
assert "step" in x.keys() and x["step"] is not None, "No step in the x dimension"
assert isinstance(float(x["step"]), float), "x step must be float"
assert "step" in y.keys()and y["step"] is not None, "No step in the x dimension"
assert isinstance(float(y["step"]), float), "y step must be float"

# Units of Measurement
assert "unit" in x.keys(), "No unit in x dimensions"
assert isinstance(x["unit"], str), "x dimension unit must be a string"
assert "unit" in y.keys(), "No unit in y dimensions"
assert isinstance(y["unit"], str), "y dimension unit must be a string"

# Horizontal CRS
assert "reference_system" in x.keys(), "No reference_system in x dimensions"
assert isinstance(x["reference_system"], str), "x dimension reference_system must be a string"
assert "reference_system" in y.keys(), "No reference_system in y dimensions"
assert isinstance(y["reference_system"], str), "x dimension reference_system must be a string"

# Temporal
assert "t" in item.properties["cube:dimensions"].keys() or "time" in item.properties["cube:dimensions"].keys()
time = dict()
if "t" in item.properties["cube:dimensions"].keys() or "time" in item.properties["cube:dimensions"].keys():
time = item.properties["cube:dimensions"]["t"]
else:
time = item.properties["cube:dimensions"]["time"]

# Time (Begin/End)
assert "extent" in time.keys() or "values" in time.keys()
# Resolution of Time Axis (Interval)
if "values" in time.keys():
assert "step" in time.keys(), "No step in time dimensions"
assert isinstance(time["step"], str), "time's step must be a string"
# Unit of measure
assert isinstance(time["unit"], str), "time's unit must be a string"


# Range Data validation
assert "raster:bands" in item.properties.keys() or "bands" in item.properties.keys()

#TODO figure out a way to validate edc items , the ones with "bands"

if "raster:bands" in item.properties.keys():
bands = item.properties["raster:bands"]
for band in bands:
# Range Data Type
assert "data_type" in band.keys(), "No data_type in band"
assert isinstance(band["data_type"], str), "band's data_type must be a string"
assert len(band["data_type"]) > 0, "band's data_type string must not be empty"

# Range Definition
assert "definition" in band.keys(), "No definition in band"
assert isinstance(band["definition"], str), "band's definition must be a string"
assert len(band["definition"]) > 0, "band's definition string must not be empty"

# Range Description
assert "description" in band.keys(), "No description in band"
assert isinstance(band["description"], str), "band's description must be a string"
assert len(band["description"]) > 0, "band's description string must not be empty"

# Null values

assert "nodata" in band.keys() and band["nodata"]is not None, "No nodata in band"
assert isinstance(float(band["nodata"]), float), "band's nodata must be float"



# validate ID
assert isinstance(item.id, str)
assert isinstance(item.id, str), "id must be a string"

assert len(item.id) > 0, "item id string must not be empty"

assert len(item.id) > 0

# validate Description
assert "description" in properties.keys()
assert "description" in properties.keys(), "No description in the stac item"
assert isinstance(properties["description"], str), "description must be a string"
assert len(properties["description"]) > 0, "description string must not be empty"

# assert isinstance(properties["description"], str)
# assert len(properties["description"]) > 0

# Legal - License
assert "license" in item.properties.keys(), "No license in the stac item"
assert isinstance(item.properties["license"], str), "license must be a string"

#TODO: keywords must be a list
# Keywords
assert "keywords" in item.properties.keys(), "No keywords in the stac item"
keywords = item.properties["keywords"]
assert isinstance(keywords, list) or (isinstance(keywords, str) and isinstance(keywords.split(","), list)), "keywords is not a valid list"
assert len(item.properties["keywords"]) > 0, "keywords must not be empty"

@pytest.mark.parametrize("dir", [
os.path.join('stac_dist', f) for f in os.listdir(
Expand All @@ -29,31 +143,14 @@ def test_items(dir):
'catalog.json') and f.endswith('.json')]

for item in items:
stac_item = pystac.Item.from_file(os.path.join(dir, item))
item_path = os.path.join(dir, item)

stac_item = pystac.Item.from_file(item_path)

validate_item(stac_item)



# Mandatory Columns (additions in bold)
# ID [Column C]
# Description [D]
# Data Source [Column E]
# Owner/Organisation [G]
# Horizontal
# Horizontal CRS [Column P]
# Bounding Box (Horizontal) [Column Q-T]
# Resolution of Horizontal Axis (ie. Pixel Size) [Column W]
# Units of Measurement [Column U]
# Temporal
# Time (Begin/End) [Column AD-AE]
# Resolution of Time Axis (Intervall) [Column AH]
# Unit of measure [Column AF]
# Range Data Type [Column AR]
# Range Definition [AS]
# Range Description [AT]
# Null values [Column AQ]
# Legal - License [BA]
# Keywords - Keywords [BJ]
# In addition, some fields can be filled with defaults, e.g.
# Metadata Standard: STAC
# Provision Date: Date being provided
Loading

0 comments on commit 9792fe8

Please sign in to comment.