-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_result: Align with load_collection #220
Comments
We have already processes like
abstractly speaking yes, but a backend can work in a lazy-loading approach and limit the actual loading within the constraints provided by subsequent The same question can be raise about |
Yes, but the argumentation for adding those parameters to load_collection was that you have to load all data first just to discard a lot of data afterwards. Thus we added them and I think this argumentation is still valid, although I'm aware that VITO optimizes filter_ operations more than other back-ends.
Indeed, added above. |
Related issue: Open-EO/openeo-api#376 |
With the recent discussions around #241 and read/get_vector in the Python driver, I'm wondering whether we should aim for something generic that doesn't care about the actual place the data is stored and can load cubes from different sources. I'm not sure we actually need to have specific functions for all of them. It would avoid the issue in #241 a bit at least. Something like load_collection, but instead of specifying a collection ID, you specify either a URL, file on the back-end user workspace or a result (is basically also just a URL where you check whether it's accessible by the current user).
|
that sounds like an interesting solution but then the "collection_id" (or "location" in your snippet) probably has to become a more complex argument: a string that's a traditional collection_id or some kind of URL. Or optionally also an object that allows setting additional load options (e.g. filename globs, file type whitelists or blacklists, ...) The next problem is then probably that you need a new "capabilities endpoint" where a backend can declare which kind of "locations" are supported |
I'm not sure that is clear, but I'd leave load_collection untouched and not allow (internal) collection IDs in import. The location could be defined as follows in schema: {
"name": "location",
"description": "...",
"schema": [
{
"title": "Multiple files on server-side user workspace",
"type": "array",
"subtype": "file-paths",
"items": {
"type": "string",
"subtype": "file-path",
"pattern": "^[^\r\n\\:'\"]+$"
}
},
{
"title": "Single file on server-side user workspace (we may want to remove this for simplicity)",
"type": "string",
"subtype": "file-path",
"pattern": "^[^\r\n\\:'\"]+$"
},
{
"title": "Remote files (Absolute URL)",
"type": "string",
"subtype": "uri",
"pattern": "^(http|https|s3)://"
},
{
"title": "Batch Job ID",
"type": "string",
"subtype": "job-id",
"description": "A batch job id, either one of the jobs a user has stored or a publicly available job.",
"pattern": "^[\\w\\-\\.~]+$"
}
]
} Yes, we have the issue that we overload some data types a bit in a number of processes. If we want to have very specific things like globs, then this idea doesn't work very well and I'd say we need to stick with individual processes. |
@aljacob @lforesta @sophieherrmann I heard this (load_result) being discussed today as being required by a use-case. I didn't have that on my list as being required for UC3 or 6 in openEO Platform. Could you please clarify? |
I've got a PR up, although this is just load_result now. Reasoning:
See PR #292. |
* Improve load_result #220 and other minor alignments
load_result
(andload_uploaded_files
) can only load a full result, but not filter extents like in load_collection. Should load_result be aligned (i.e. add spatial, temporal extents and metadata filters?)The text was updated successfully, but these errors were encountered: