-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Central job submission endpoint (also: POST /processes/{processID}/execution
is not RESTful and is confusing for workflows)
#419
Comments
POST /processes/{processID}/execution
is not RESTful and is confusing for workflowsPOST /processes/{processID}/execution
is not RESTful and is confusing for workflows)
I will repeat my answer during the Testbed's meeting just for the sake of sharing with everyone openly.
Given that no job would be created in this case (which is technically considered the default/minimal requirement of OAP), the inverse non-RESTful argument arises if Note that I agree with I think I think sync/async and job creation implies opposite ways to think about it, and none will be satisfied with either approach. My preference is to reduce the number of endpoints doing the same thing, even if it might feel odd for one approach over the other. That being said, I would most probably support either way regardless for historical/compatibility reasons. |
If a particular "root" process really does not make sense for some workflow definition (although there are work arounds for that, like a simple process gathering outputs from multiple processes, whether as separate outputs, as an array of outputs, or as an actual combining operation like merging multiple bands into a single GeoTIFF), then we could probably agree on some other end-point where to execute a workflow in an ad-hoc manner. For pre-deployment of workflow, Part 2 should still be used (POST a new workflow definition at Whether using |
During the SWG meeting on 2024-07-08, I introduced the idea of defining a conformance class in the OGC API - Processes - Part 3: Workflows to add POST on With Part 1, it stays the same: POST The response contains a JSON object conforming to statusInfo.yaml and a header with the location of the created job ( With Part 3, you would be also able to use the following: POST Here, there are options for what happens.
Using the second option, we may then imagine using POST on I think adding this modification in the Part 3 draft specification would help align OGC API - Processes with OpenEO. If there is interest in adding this to the Part 3 draft, I volunteer to start the writing and work on PR for this addition for further discussion. |
Ideally, this should be handled in Part 1 as well.
Even if the execution starts right away, there is an advantage. It allows the definition of a graph (rather than a chain) that does not have a single root process, which is a requirement with The workaround is to define the process corresponding to that graph to invoke it as a root process, but this involves support of Part 2 to deploy it.
Note that I believe that a If this is the strategy that is taken, I would also like to have a way (query or body parameter?) to indicate whether the job should wait ( |
I agree with that point, but the schema that defines the process key is currently only available in the Part 3 draft specification. If we move it to the core, I don't see any objection.
I support the new I thought returning a statusInfo makes sense as it corresponds to how we get information regarding a job, and we POST on To summarize, this would result in the addition of the following endpoints:
I would prefer to change from If there is interest in moving in that direction, I would be happy to volunteer to start a PR with the required updates in Part 1 or Part 3, depending on what we decide about the execute-workflow.yaml schema (moving it to core). |
@gfenoy @fmigneau STOP! We were almost ready to submit Part 1 and Part 2 to the OAB for review and then RFC. Now we are proposing to change or add a completely NEW way to execute processes. Sorry but this is not something we can just throw into Part 1 and Part 3 without (a) a lot of dicussion in the SWG and (b) at least some testing in a code sprint or two. |
My preference would be to leave Part 1 as-is except for bug fixes / clarifications of ambiguity, maintaining what so far was a mostly backward compatible path from 1.0. We can have the discussion about new end-points and/or new methods at existing end-points for Part 3: Workflows, since it's still at a relatively early stage compared to the Part 1 revision. |
I would second @pvretano and @jerstlouis sentiment. We should continue with Part 1 as-is and address these suggestion in Part 3 or as a future item. I'm happy to have a conversation about it at the next SWG meeting - but I will need a lot of convincing at this point. |
@pvretano I would even be fine to define an entirely separate "Part 4" for That being said, I think it is worthwhile to have these discussions early on since TB20 is working on it. It does mean it has to be integrated yet at all. However, not discussing these issues now will lead to bifurcating implementations, and more problems down the road when we try to "realign" them. |
@pvretano, @sptillma, @jerstlouis, @fmigneault, I agree with all of you. I'm as eager as you are to see the new releases from Part 1 and Part 2. I want to clarify that initially, I only mentioned the integration within the Part 3 draft. I am also open to @fmigneault's proposal to start a new Part 4 for OGC API - Processes. |
ProposalShared by @gfenoy
Note PR: #437 ReviewNote Following points are updated according to following discussions to keep items grouped in a single comment.
Suggestions (meeting 2024-09-09)
Extra
|
Quotation should not be part of job management. It should be its own part. |
I agree. A server could handle quotation/billing on its own, with only OGC API - Processes considerations and no intention to cross-walk with openEO and alternate job management/submissions. |
Thanks a lot for your feedback.
Initially, I was thinking of using
I fully agree with that point, and we may investigate the RO-Crate Specification, it seems to be linked to the Research Object manifest produced by cwlprov. What do you think?
I support this idea of a I was initially thinking of something like this:
It is an array of Do you have something else in mind? For example,
Or we can have the main log file accessible in a
Or it can also be the We may also rely on the logs from OpenEO directly: https://api.openeo.org/#tag/Batch-Jobs/operation/debug-job.
I agree with both of you and removed this part from the draft. Nevertheless, I would like to point out this was related to alignment with OpenEO and the
It is an issue with the copy/paste from DRU, which I started this draft with. It is now fixed.
Please take a look at the previous comment.
Right, the draft should now use PATCH to update the job definition in place of PUT.
In the current draft, the
The following schema was added: https://github.com/GeoLabs/ogcapi-processes/blob/proposal-part4-initial/openapi/schemas/processes-job-management/statusCode.yaml. I would have preferred extending the original statusCode.yaml rather than redefining it from scratch.
I would say that it would be possible to start the job except if the status is The same applies to the update operation.
Doing so would lead to having two operations with the same
I am not sure to follow here, but I can read the following in the current draft: "The Deployable Workflows conformance class specifies how a workflow execution request as defined in OGC API — Processes — Part 1: Core, with or without the capabilities defined in other conformance classes of this Part 3 extension, can be used as an application package payload to deploy new processes using OGC API — Processes — Part 2: Deploy, Replace, Undeploy." Here, the idea was to POST an execute request body following the execute-workflow.yaml schema. In other words, a processes chain would be sent to this endpoint to deploy a job (with As defined currently, only nested-processes requirement class should be a pre-requisite. I agree that the CWL can fit with the definition of a job. To be continued... |
I think what would make it a recommendation is to provide specific
Since CWL-Prov uses as Research Object under the hood (see: https://github.com/common-workflow-language/cwlprov#overview), it seems to be a logical choice. However, I'm not quite sure to understand yet what "more" the Research Object provides. Looking for example at https://www.researchobject.org/ro-crate/tlcmap, it seems all the metadata would be referenced one way or another by the PROV-JSON/XML/RDF definition. The RO-crate seems relevant only if we want to export everything with metadata in a single ZIP. However, do we really want to (or even can feasibly) export all source data, considering it might be massive? I'm not sure of all implications regarding the RO, but the PROV definition seems relatively simple to implement and can be represented by any equivalent JSON/XML/RDF structure without depending explicitly on CWL.
Currently, CRIM's implementation supports
I think it is better to mention it can be started only if in
I'm slightly torn about this. On one hand,
You're right. Workflow Execute Request is enough with Nested Processing. I had assumed that given a Thanks for the other draft fixes. |
I agree about supporting PROV. For reference, here is the RELIANCE ROcrate profile that may be of interest. Do we agree that it means that we would then access from the We can use the Accept header to select the expected encoding.
I agree, so I propose adding a new endpoint, I don't think any schema should be defined in the draft for such a log endpoint (each server implementation can then provide its own schema in the API definition). Do you agree?
As described here: [...]if the job status is I think can be handled the same way in Part 4.
Maybe we would need feedback from @m-mohr here. As said previously, I would like to add a |
Yes (2x)
OK.
I personally prefer to create a separate job to avoid conflicting definitions and keep the logs of the failing case. This can be used for monitoring, error reporting and statistics. However, I am not against implementations using a job-replacement approach. Part 4 should most probably allow this flexibility, but describe what is expected in each case. For example, if a server decides to disallow it, which code/error should be returned? Otherwise, if allowed, what is the expected response (StatusInfo and
I find this goes against the "Job Management" idea of Part 4. It would add an endpoint with no job reference, no retrievable logs/provenance/etc., and doesn't give the option to easily switch between sync/async just by modifying the |
We added the following requirement /req/provenance/run/response. The current requirement states that the PROV-JSON is expected by default if no content negotiation is used. Also, the following permission was added: /per/provenance/run/content-negotiation to add optional content type. Maybe
We still need to add an endpoint to access logs. I noted this issue, in which you concluded that there may not be a requirement for such a log(s) endpoint. Adding this endpoint would make sense for the job-management extension.
I am okay with that. Maybe the standard can add a requirement class /req/job-management/replacement, meaning that the server supports the job-replacement approach (meaning that a job can be run for
I agree on that point. But can we imagine adding a new requirement class called |
@gfenoy The URIs of the specification elements that you have listed above break OGC-NA policy. Please see chapter 6 and 8 of https://docs.ogc.org/pol/10-103r1.html For requirements classes, there should only be one segment after the Similarly, for conformance classes, there should only be one segment after the For requirements, there should only be two segments after the Similarly, for abstract tests, there should only be two segments after the Please fix the segments. |
Good point. |
…sts from Part 2 following OGC-NA as pointed by @ghobona opengeospatial#419 (comment)
Thanks a lot, @ghobona, for pointing out this mistake. It should now be solved for the WIP Part 4 06656a8, part of #437. Also, for Part 2 it should be solved here: #444. |
In openEO we use the relation type |
This seems like another incompatibility with OGC API - Processes that uses |
…ger (relates to opengeospatial/ogcapi-processes#437, relates to opengeospatial/ogcapi-processes#419, relates to #716)
This should have been fined in Part 4 and also applied here to the Part 2 which was containing the same issues. |
The standard says:
In REST, everything should be centered around resources. The endpoint
POST /processes/{processID}/execution
is not a resource and POSTing to it, should create e.g./processes/{processID}/execution/{executionId}
, but it doesn't. Instead it may for example create/jobs/{jobId}
(in async) (or return a result directly in sync).To create a job, asynchronous processing requests should be sent to
POST /jobs
. This would also remove the issue that there is this weirdness that for workflows you need to pick a "main process" to work underneath.Singular processes could also be sent there with a workflow that just consists of a single processing node.
if you just send async requests to the endpoint issues with the Prefer header would also be solved: #413
Synchronous processing for a single process could still be sent to
POST /processes/{processID}/execution
but it would be weird for a workflow to be sent to that endpoint, too. So maybe it should be a separate top-level endpoint?PS: This came up in some discussions with @fmigneault and @aljacob recently so posting it here for discussion.
The text was updated successfully, but these errors were encountered: