automated extraction workflow for new RDM datasets #4

jdries · 2023-12-04T17:52:20Z

For every reference dataset, we want to create a mirrored set of extracted features. This would typically be a GeoParquet file.

OpenEO can compute this Parquet file, and we can store it online (object storage, artifactory,...). We can then update the RDM with a link to the extractions for a given reference file?

For user trained models, we propose to add a constraint that the reference dataset should cover a limited area, to simplify smooth extraction. It is however possible to extend this with other, public, extractions??

Use of duckdb:
DuckDB could be interesting for us, because it's an in-memory database, which means we don't need to set up a server. If we grow to the point where we do need one, we can still do it then.

Detailed design:
https://confluence.vito.be/display/EP/WorldCereal

kvantricht · 2024-02-05T10:30:55Z

Depends on #18

kvantricht · 2024-02-05T10:31:46Z

@VincentVerelst please define subtasks in this issue to split up the work.

kvantricht · 2024-03-04T10:44:50Z

start from (overarching) STAC catalogue with raw extractions + rasterized ground truth
load all datacubes into OpenEO (extraction NetCDF cubes) + merge with DEM collection
compute features: start from example UDF (applying cloud mask, compositing (monthly), apply_dimension to "compute" features from timestamps and channels, sampling ground truth pixels (based on this old code), writing result to geoparquet

VincentVerelst · 2024-03-04T12:29:35Z

Feature computation example UDF: https://git.vito.be/projects/APPL/repos/cropclass/browse/src/cropclass/classification.py?at=refs%2Fheads%2Fmain#311

VincentVerelst · 2024-03-07T13:01:31Z

load_stac results in no metadata in the Python client. Currently working to at least have the band names: Open-EO/openeo-python-client#527

kvantricht · 2024-04-08T09:51:20Z

We might want to proceed by first defining a patch using OpenEO collections. Once everything works we can replace collection loaders by load_stac.

JeroenVerstraelen · 2024-04-09T09:39:34Z

This can be split up in smaller issues.

VincentVerelst · 2024-04-09T11:13:40Z

Moving this issue a few sprints further. Splitted up in subissues for basic (non-automated) feature computation.

jdries assigned jdries and VincentVerelst and unassigned jdries Dec 4, 2023

kvantricht added the enhancement New feature or request label Jan 23, 2024

kvantricht changed the title ~~automated feature extraction~~ automated feature computation Feb 26, 2024

VincentVerelst mentioned this issue Apr 9, 2024

Use load_stac to read in rasterized extractions #85

Closed

3 tasks

kvantricht changed the title ~~automated feature computation~~ automated extraction workflow for new RDM datasets Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automated extraction workflow for new RDM datasets #4

automated extraction workflow for new RDM datasets #4

jdries commented Dec 4, 2023 •

edited

Loading

kvantricht commented Feb 5, 2024

kvantricht commented Feb 5, 2024

kvantricht commented Mar 4, 2024 •

edited

Loading

VincentVerelst commented Mar 4, 2024

VincentVerelst commented Mar 7, 2024

kvantricht commented Apr 8, 2024

JeroenVerstraelen commented Apr 9, 2024

VincentVerelst commented Apr 9, 2024

automated extraction workflow for new RDM datasets #4

automated extraction workflow for new RDM datasets #4

Comments

jdries commented Dec 4, 2023 • edited Loading

kvantricht commented Feb 5, 2024

kvantricht commented Feb 5, 2024

kvantricht commented Mar 4, 2024 • edited Loading

VincentVerelst commented Mar 4, 2024

VincentVerelst commented Mar 7, 2024

kvantricht commented Apr 8, 2024

JeroenVerstraelen commented Apr 9, 2024

VincentVerelst commented Apr 9, 2024

jdries commented Dec 4, 2023 •

edited

Loading

kvantricht commented Mar 4, 2024 •

edited

Loading