-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add load_stac process #402
add load_stac process #402
Comments
Is a v2.x feature: Open-EO/openeo-processes#439 |
Differences with
|
From the process description:
@jdries does this imply that the logged-in user should be able load his own batch job results by providing a /jobs/{id}/results URL that is not signed? |
Good question, because I thought the canonical link was the only way to retrieve the correct url, but I think that this is indeed implied here. Note that for this ticket, implementing the remote url option is actually the main goal. |
Indeed, I pushed for that because the signed URLs have an expiry, could be invalidated/rotated, ..., while the |
@soxofaan ok. I was thinking about the implications of this. In In the case of Thoughts? |
Indeed, that was also my thought: if we detect a static (non-signed) job result URL, we just have to check that the current user owns the referenced job (which is equivalent to go full HTTP route with auth header stuff) |
It looks like that, at some point in the code, we have to determine whether the provided URL represents a static STAC catalog (incl. batch job results) or a dynamic STAC API Collection (which supports search requests). A possible way to accomplish this seems to consist of:
@m-mohr does this approach seem right to you? |
Sounds reasonable. |
Download this cube to test it: data_cube = (connection .load_stac(url="https://tamn.snapplanet.io/collections/S2", spatial_extent={"west": -87.83465281740789, "south": 42.57836607418331, "east": -87.80890361086492, "north": 42.59100512331456}, temporal_extent=["2022-05-10", "2022-05-10"], bands=["B04", "B03", "B02"]) .save_result("GTiff"))
Otherwise you get a 404 on e.g. https://landsatlook.usgs.gov/stac-server//search?collections=landsat-c2l2-sr&limit=100&bbox=-87.83488652056919%2C42.57816424833807%2C-87.80864005698832%2C42.59122817379843&page=1&datetime=2022-05-20T00%3A00%3A00Z%2F2022-05-20T23%3A59%3A59.999999999Z Open-EO/openeo-geopyspark-driver#402
ERROR openeo_driver.views.error:views.py:268 Py4JJavaError('An error occurred while calling None.org.openeo.geotrellis.file.PyramidFactory.\n', JavaObject id=o115) Traceback (most recent call last): File "/home/bossie/PycharmProjects/openeo/venv38/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request rv = self.dispatch_request() File "/home/bossie/PycharmProjects/openeo/venv38/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/users/auth.py", line 88, in decorated return f(*args, **kwargs) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/views.py", line 624, in result result = backend_implementation.processing.evaluate(process_graph=process_graph, env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 271, in evaluate return evaluate(process_graph=process_graph, env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 342, in evaluate result = convert_node(result_node, env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 362, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 1578, in apply_process args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 1578, in <dictcomp> args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 374, in convert_node return convert_node(processGraph['node'], env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 362, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 1673, in apply_process return process_function(args=args, env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 2191, in load_stac return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env) File "/home/bossie/PycharmProjects/openeo/openeo-geopyspark-driver/openeogeotrellis/backend.py", line 836, in load_stac pyramid_factory = jvm.org.openeo.geotrellis.file.PyramidFactory(stac_api_client, File "/home/bossie/PycharmProjects/openeo/venv38/lib/python3.8/site-packages/py4j/java_gateway.py", line 1585, in __call__ return_value = get_return_value( File "/home/bossie/PycharmProjects/openeo/venv38/lib/python3.8/site-packages/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling None.org.openeo.geotrellis.file.PyramidFactory. : java.lang.NullPointerException at org.openeo.geotrellis.file.PyramidFactory.<init>(PyramidFactory.scala:42) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750)
… URL #402 In a batch job started from an async_task (think: SHub batch process), there's no access_token in the user's internal_auth_data; this was erroneously passed as the string "None" to the batch job. Instead of an expected KeyError upon reconstructing the Bearer token, the Bearer token was actually "basic//None", failing as well but with an unexpected 403 Forbidden instead. File "batch_job.py", line 1298, in <module> main(sys.argv) File "batch_job.py", line 1034, in main run_driver() File "batch_job.py", line 1004, in run_driver run_job( File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/utils.py", line 52, in memory_logging_wrapper return function(*args, **kwargs) File "batch_job.py", line 1099, in run_job result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 348, in evaluate result = convert_node(result_node, env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 368, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1480, in apply_process args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1480, in <dictcomp> args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 380, in convert_node return convert_node(processGraph['node'], env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 368, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1480, in apply_process args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1480, in <dictcomp> args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 380, in convert_node return convert_node(processGraph['node'], env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 368, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1512, in apply_process return process_function(args=ProcessArgs(args, process_id=process_id), env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2088, in load_stac return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 937, in load_stac url = signed_results_url() # FIXME: remove HTTP workaround, load job results directly (~ load_result) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 858, in signed_results_url resp.raise_for_status() File "/opt/venv/lib64/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://openeo-dev.vito.be/openeo/1.1/jobs/j-2802b19806a84c3eb8772c97e4abb2b7/results
Removes workaround where a Bearer token was reconstructed to be able to obtain a canonical URL and load STAC from there. Fixes the combination of load_stac and SHub batch processes.
Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request rv = self.dispatch_request() File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/users/auth.py", line 88, in decorated return f(*args, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/views.py", line 619, in result result = backend_implementation.processing.evaluate(process_graph=process_graph, env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 277, in evaluate return evaluate(process_graph=process_graph, env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 348, in evaluate result = convert_node(result_node, env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 368, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1480, in apply_process args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1480, in <dictcomp> args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 380, in convert_node return convert_node(processGraph['node'], env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 368, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1512, in apply_process return process_function(args=ProcessArgs(args, process_id=process_id), env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2088, in load_stac return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1031, in load_stac itm.properties.get("datetime") or itm.properties["start_datetime"], KeyError: 'start_datetime'
With the temporary catalog workarounds applied, this batch job runs successfully and produces 3 GeoTiffs with 3 bands each: connection = openeo.connect("openeo-3-1.openeo-vlcc-prod").authenticate_oidc()
data_cube = (connection
.load_stac(url="https://geoville/resto/collections/BVLPROBA_v1",
spatial_extent={"west": 20.579494542018466, "south": 54.31120577537291,
"east": 20.631426035739594, "north": 54.33263361375995},
temporal_extent=["2019-01-01", "2021-12-31"],
bands=["band1", "band2", "band3"])
.save_result("GTiff"))
job = data_cube.execute_batch()
job.download_results("/tmp")
# openEO_2019-01-01Z.tif
# openEO_2020-01-01Z.tif
# openEO_2021-01-01Z.tif |
Open-EO/openeo-geopyspark-driver#402 Traceback (most recent call last): File "/home/bossie/PycharmProjects/openeo/venv38/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request rv = self.dispatch_request() File "/home/bossie/PycharmProjects/openeo/venv38/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/users/auth.py", line 88, in decorated return f(*args, **kwargs) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/views.py", line 619, in result result = backend_implementation.processing.evaluate(process_graph=process_graph, env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 277, in evaluate return evaluate(process_graph=process_graph, env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 342, in evaluate convert_node(result_node, env=env.push({ENV_DRY_RUN_TRACER: dry_run_tracer, ENV_SAVE_RESULT:[], "node_caching":False})) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 368, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 1480, in apply_process args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 1480, in <dictcomp> args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 380, in convert_node return convert_node(processGraph['node'], env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 368, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 1512, in apply_process return process_function(args=ProcessArgs(args, process_id=process_id), env=env) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 752, in reduce_dimension dimension = args.get_required( File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/processes.py", line 309, in get_required self._check_value(name=name, value=value, expected_type=expected_type, validator=validator) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/processes.py", line 336, in _check_value raise ProcessParameterInvalidException(parameter=name, process=self.process_id, reason=reason) openeo_driver.errors.ProcessParameterInvalidException: The value passed for parameter 'dimension' in process 'reduce_dimension' is invalid: Must be one of [] but got 't'.
From https://github.com/stac-api-extensions/query: "It is recommended to implement the Filter Extension instead of the Query Extension. Filter Extension is more well-defined, more expressive, and uses the standardized CQL2 query language instead of the proprietary language defined here. There is no plan to deprecate this extension, but it is also unlikely to see any further refinement or changes."
As discussed: seeing as client-side filtering on Item properties tends to work better than the STAC API Filter extension (400 "Unknown property in filter" most of the time) it was decided to postpone this TODO. |
Are you aware of the queryables endpoint, which you can use to retrieve the available properties for the filters? |
This particular STAC API returns "additionalProperties": true but still rejects most properties. |
Is it a public API? If yes, which? |
The API is not public. |
done! |
https://processes.openeo.org/draft/#load_stac
This probably has a lot in common with load_result.
There is one important change: we want to use FileLayerProvider, with a STAC client.
Specific case:
STAC API Collection that allows to filter items and to download assets.
The text was updated successfully, but these errors were encountered: