-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automated extraction workflow for new RDM datasets #4
Comments
Depends on #18 |
@VincentVerelst please define subtasks in this issue to split up the work. |
|
Feature computation example UDF: https://git.vito.be/projects/APPL/repos/cropclass/browse/src/cropclass/classification.py?at=refs%2Fheads%2Fmain#311 |
|
We might want to proceed by first defining a patch using OpenEO collections. Once everything works we can replace collection loaders by |
This can be split up in smaller issues. |
Moving this issue a few sprints further. Splitted up in subissues for basic (non-automated) feature computation. |
For every reference dataset, we want to create a mirrored set of extracted features. This would typically be a GeoParquet file.
OpenEO can compute this Parquet file, and we can store it online (object storage, artifactory,...). We can then update the RDM with a link to the extractions for a given reference file?
For user trained models, we propose to add a constraint that the reference dataset should cover a limited area, to simplify smooth extraction. It is however possible to extend this with other, public, extractions??
Use of duckdb:
DuckDB could be interesting for us, because it's an in-memory database, which means we don't need to set up a server. If we grow to the point where we do need one, we can still do it then.
Detailed design:
https://confluence.vito.be/display/EP/WorldCereal
The text was updated successfully, but these errors were encountered: