Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automated extraction workflow for new RDM datasets #4

Open
jdries opened this issue Dec 4, 2023 · 8 comments
Open

automated extraction workflow for new RDM datasets #4

jdries opened this issue Dec 4, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@jdries
Copy link
Collaborator

jdries commented Dec 4, 2023

For every reference dataset, we want to create a mirrored set of extracted features. This would typically be a GeoParquet file.

OpenEO can compute this Parquet file, and we can store it online (object storage, artifactory,...). We can then update the RDM with a link to the extractions for a given reference file?

For user trained models, we propose to add a constraint that the reference dataset should cover a limited area, to simplify smooth extraction. It is however possible to extend this with other, public, extractions??

Use of duckdb:
DuckDB could be interesting for us, because it's an in-memory database, which means we don't need to set up a server. If we grow to the point where we do need one, we can still do it then.

Detailed design:
https://confluence.vito.be/display/EP/WorldCereal

@jdries jdries assigned jdries and VincentVerelst and unassigned jdries Dec 4, 2023
@kvantricht kvantricht added the enhancement New feature or request label Jan 23, 2024
@kvantricht
Copy link
Collaborator

Depends on #18

@kvantricht
Copy link
Collaborator

@VincentVerelst please define subtasks in this issue to split up the work.

@kvantricht kvantricht changed the title automated feature extraction automated feature computation Feb 26, 2024
@kvantricht
Copy link
Collaborator

kvantricht commented Mar 4, 2024

  • start from (overarching) STAC catalogue with raw extractions + rasterized ground truth
  • load all datacubes into OpenEO (extraction NetCDF cubes) + merge with DEM collection
  • compute features: start from example UDF (applying cloud mask, compositing (monthly), apply_dimension to "compute" features from timestamps and channels, sampling ground truth pixels (based on this old code), writing result to geoparquet

@VincentVerelst
Copy link
Collaborator

@VincentVerelst
Copy link
Collaborator

load_stac results in no metadata in the Python client. Currently working to at least have the band names: Open-EO/openeo-python-client#527

@kvantricht
Copy link
Collaborator

We might want to proceed by first defining a patch using OpenEO collections. Once everything works we can replace collection loaders by load_stac.

@JeroenVerstraelen
Copy link

This can be split up in smaller issues.

@VincentVerelst
Copy link
Collaborator

Moving this issue a few sprints further. Splitted up in subissues for basic (non-automated) feature computation.

@kvantricht kvantricht changed the title automated feature computation automated extraction workflow for new RDM datasets Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants