load_result: Align with load_collection #220

m-mohr · 2021-01-08T09:22:50Z

load_result (and load_uploaded_files) can only load a full result, but not filter extents like in load_collection. Should load_result be aligned (i.e. add spatial, temporal extents and metadata filters?)

The text was updated successfully, but these errors were encountered:

soxofaan · 2021-01-11T11:46:25Z

We have already processes like filter_temporal, filter_bbox, filter_bands to cover that, no?

load_result can only load a full result,

abstractly speaking yes, but a backend can work in a lazy-loading approach and limit the actual loading within the constraints provided by subsequent filter_ processes. Lazy loading is of course less straightforward to implement than handling direct load_result arguments.

The same question can be raise about load_uploaded_files, FYI

m-mohr · 2021-01-11T12:02:07Z

We have already processes like filter_temporal, filter_bbox, filter_bands to cover that, no?

Yes, but the argumentation for adding those parameters to load_collection was that you have to load all data first just to discard a lot of data afterwards. Thus we added them and I think this argumentation is still valid, although I'm aware that VITO optimizes filter_ operations more than other back-ends.

The same question can be raise about load_uploaded_files, FYI

Indeed, added above.

m-mohr · 2021-04-09T14:12:41Z

Related issue: Open-EO/openeo-api#376

m-mohr · 2021-04-27T15:15:54Z

With the recent discussions around #241 and read/get_vector in the Python driver, I'm wondering whether we should aim for something generic that doesn't care about the actual place the data is stored and can load cubes from different sources. I'm not sure we actually need to have specific functions for all of them. It would avoid the issue in #241 a bit at least.

Something like load_collection, but instead of specifying a collection ID, you specify either a URL, file on the back-end user workspace or a result (is basically also just a URL where you check whether it's accessible by the current user).

import(location, ?spatial_extent, ?temporal_extent, ?bands, ?filter) -> raster/vector cube

soxofaan · 2021-04-28T08:08:39Z

that sounds like an interesting solution

but then the "collection_id" (or "location" in your snippet) probably has to become a more complex argument: a string that's a traditional collection_id or some kind of URL. Or optionally also an object that allows setting additional load options (e.g. filename globs, file type whitelists or blacklists, ...)

The next problem is then probably that you need a new "capabilities endpoint" where a backend can declare which kind of "locations" are supported

m-mohr · 2021-04-28T08:22:08Z

I'm not sure that is clear, but I'd leave load_collection untouched and not allow (internal) collection IDs in import.

The location could be defined as follows in schema:

{
	"name": "location",
	"description": "...",
	"schema": [
		{
			"title": "Multiple files on server-side user workspace",
			"type": "array",
			"subtype": "file-paths",
			"items": {
				"type": "string",
				"subtype": "file-path",
				"pattern": "^[^\r\n\\:'\"]+$"
			}
		},
		{
			"title": "Single file on server-side user workspace (we may want to remove this for simplicity)",
			"type": "string",
			"subtype": "file-path",
			"pattern": "^[^\r\n\\:'\"]+$"
		},
		{
			"title": "Remote files (Absolute URL)",
			"type": "string",
			"subtype": "uri",
			"pattern": "^(http|https|s3)://"
		},
		{
			"title": "Batch Job ID",
			"type": "string",
			"subtype": "job-id",
			"description": "A batch job id, either one of the jobs a user has stored or a publicly available job.",
			"pattern": "^[\\w\\-\\.~]+$"
		}
	]
}

Yes, we have the issue that we overload some data types a bit in a number of processes.

If we want to have very specific things like globs, then this idea doesn't work very well and I'd say we need to stick with individual processes.

m-mohr · 2021-06-04T11:48:05Z

@aljacob @lforesta @sophieherrmann I heard this (load_result) being discussed today as being required by a use-case. I didn't have that on my list as being required for UC3 or 6 in openEO Platform. Could you please clarify?

m-mohr · 2021-10-26T09:40:12Z

I've got a PR up, although this is just load_result now. Reasoning:

For load_uploaded_files I'm now thinking we don't need additional parameters as I'd expect that someone really only uploads what is needed for a use case and then doesn't need additional filtering in the process. If that rare use case is still required, use filter functions.
As such, I've simply added spatial_extent, temporal_extent and bands to the load_result function and allow loading by URLs now. These additions are depending a lot on the underlying data structure, e.g. I'd think you could remotely filter by spatial_extent easily on a COG, but for bands and temporal_extent you need proper metadata. For other file formats this may not work at all. Also, loading by URL is not available on all back-ends (only introduced in API v1.1.0). I've excluded property filtering for now as this would require an API to be present: STAC API for batch job results? openeo-api#398 (or User-generated Collections openeo-api#376)

See PR #292.

* Improve load_result #220 and other minor alignments

m-mohr added the question Further information is requested label Jan 8, 2021

m-mohr changed the title ~~load_result: Align with load_collection~~ load_result/load_uploaded_files: Align with load_collection Jan 11, 2021

m-mohr added this to the 1.1.0 milestone Apr 12, 2021

m-mohr added the vector label Apr 21, 2021

m-mohr mentioned this issue Apr 28, 2021

Make it easier to distinguish strings (paths/urls/code/...) #245

Merged

m-mohr modified the milestones: 1.1.0, 1.2.0 May 18, 2021

m-mohr self-assigned this Jun 4, 2021

m-mohr added the platform label Jun 4, 2021

m-mohr modified the milestones: 1.2.0, 1.3.0 Oct 25, 2021

m-mohr added a commit that referenced this issue Oct 26, 2021

Improve load_result #220 and other minor alignments

f8ee247

m-mohr mentioned this issue Oct 26, 2021

load_result: Load by URL and filter by extents and bands #292

Merged

m-mohr linked a pull request Oct 26, 2021 that will close this issue

load_result: Load by URL and filter by extents and bands #292

Merged

m-mohr modified the milestones: 1.3.0, 1.2.0 Oct 26, 2021

sophieherrmann mentioned this issue Oct 27, 2021

Implement load_result from signed url Open-EO/openeo-processes-python#63

Open

m-mohr changed the title ~~load_result/load_uploaded_files: Align with load_collection~~ load_result: Align with load_collection Nov 16, 2021

m-mohr added a commit that referenced this issue Dec 1, 2021

load_result: Load by URL and filter by extents and bands (#292)

5abad4b

* Improve load_result #220 and other minor alignments

m-mohr closed this as completed Dec 1, 2021

m-mohr mentioned this issue Dec 1, 2021

Release openEO processes v1.2.0 Open-EO/PSC#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_result: Align with load_collection #220

load_result: Align with load_collection #220

m-mohr commented Jan 8, 2021 •

edited

Loading

soxofaan commented Jan 11, 2021

m-mohr commented Jan 11, 2021 •

edited

Loading

m-mohr commented Apr 9, 2021

m-mohr commented Apr 27, 2021 •

edited

Loading

soxofaan commented Apr 28, 2021

m-mohr commented Apr 28, 2021 •

edited

Loading

m-mohr commented Jun 4, 2021

m-mohr commented Oct 26, 2021 •

edited

Loading

load_result: Align with load_collection #220

load_result: Align with load_collection #220

Comments

m-mohr commented Jan 8, 2021 • edited Loading

soxofaan commented Jan 11, 2021

m-mohr commented Jan 11, 2021 • edited Loading

m-mohr commented Apr 9, 2021

m-mohr commented Apr 27, 2021 • edited Loading

soxofaan commented Apr 28, 2021

m-mohr commented Apr 28, 2021 • edited Loading

m-mohr commented Jun 4, 2021

m-mohr commented Oct 26, 2021 • edited Loading

m-mohr commented Jan 8, 2021 •

edited

Loading

m-mohr commented Jan 11, 2021 •

edited

Loading

m-mohr commented Apr 27, 2021 •

edited

Loading

m-mohr commented Apr 28, 2021 •

edited

Loading

m-mohr commented Oct 26, 2021 •

edited

Loading