Add Argo WF conformance class #386

christophenoel · 2023-12-20T15:27:40Z

Over the years, our team has been gradually transitioning our implementation (including operational PDGS) to the Argo Workflow Language. This decision was made based on the Argo Workflow Language superior suitability for container-based workflows and modules, particularly when interacting with Kubernetes native environments. Additionally, the specification aligns well with the OpenAPI/JSON schemas that form the foundation of OGC API Processes.

To facilitate this transition, we have prepared a pull request that incorporates the essential requirements and recommendations for integrating the newly adopted conformance class into the existing spec. We sincerely request your consideration and integration of this profile.

(see email)

add argo req class

added Argo

added argo

fmigneault

Really nice to see more alternatives being implemented!

fmigneault · 2023-12-21T01:26:51Z

...replace_undeploy/standard/recommendations/deploy-replace-undeploy/replace/REC_body-argo.adoc

+
+part:: If a process can be described for the intended use as a <<rc_argo,Argo graph>>, implementations should consider supporting the <<rc_argo,Argo>> encoding for describing the replacement process.
+
+part:: The media type `application/argo` shall be used to indicate that request body contains a processes description encoded as <<rc_ogcapppkg,Argo>>.


Is application/argo an official media-type? If not, the generic https://www.iana.org/assignments/media-types/application/vnd.oai.workflows+yaml with a contentSchema with the Argo Workflow schema URL might be more appropriate.

An alternative would be to push Argo maintainers to publish a media-type like CWL did:

https://www.iana.org/assignments/media-types/application/cwl

https://www.iana.org/assignments/media-types/application/cwl+json

fmigneault · 2023-12-21T01:29:22Z

...y_replace_undeploy/standard/recommendations/ogcapppkg/REC_ogcapppkg_execution-unit-argo.adoc

+ * `type`  and `href` if passed by reference
+ * `value` and `mediaType` if passed by value
+
+part:: The value of the `type` property shall be `application/argo`, when for `mediaType` it should be `application/argo+json`.


Why use distinct type values?

fmigneault · 2023-12-21T01:31:32Z

...y_replace_undeploy/standard/recommendations/ogcapppkg/REC_ogcapppkg_execution-unit-argo.adoc

+
+part:: The value of the `type` property shall be `application/argo`, when for `mediaType` it should be `application/argo+json`.
+
+part:: The value of the `href` property shall be a reference to the Argo encoded file. The value of the `value` property shall be the Argo encoded in json format.


"json" should be uppercase here

fmigneault · 2023-12-21T01:32:56Z

...y_replace_undeploy/standard/recommendations/ogcapppkg/REC_ogcapppkg_execution-unit-argo.adoc

+
+part:: The value of the `href` property shall be a reference to the Argo encoded file. The value of the `value` property shall be the Argo encoded in json format.
+
+part:: If the Argo contains more than a single workflow identifier, an addition `w` query parameter may be used to target a specific workflow id to be deployed.


Might be relevant to refer to a common parameter that can be reused across Workflow languages regardless of their specific implementation.

fmigneault · 2023-12-21T01:34:21Z

...y_replace_undeploy/standard/recommendations/ogcapppkg/REC_ogcapppkg_execution-unit-argo.adoc

+
+part:: If the Argo contains more than a single workflow identifier, an addition `w` query parameter may be used to target a specific workflow id to be deployed.
+
+part:: The server should validate the Argo at the request time. In case, the server cannot find the `w` identifier within the workflow from the Argo provided, a 400 status code is expected with the type "argo-worflow-not-exist".


This seems to contradict the previous point that is worded in a way that w is optimal, while required here.

fmigneault · 2023-12-21T01:40:18Z

..._replace_undeploy/standard/recommendations/deploy-replace-undeploy/deploy/REC_body-argo.adoc

+part:: If a process can be represented for the intended use as a <<rc_argo,Argo Application>>, implementations should consider supporting the <<rc_argo,Argo>> encoding for describing the process to be deployed to the API.
+
+part:: The media type `application/argo` shall be used to indicate that request body contains a processes description encoded as a <<rc_argo,Argo Application>>.
+
+part:: If the Argo contains more than one workflow, an additional `w` query parameter may be used to reference the workflow id to be deployed.
+
+part:: The server should validate the Argo at the request time. In case, the server cannot find the `w` identifier within the workflow from the Argo provided, a 400 status code is expected with the type "worflow-not-found".


Similar comments to other file.

Update recommendations and add Requirements in the corresponding Requirements class Make CWL depending on OGC Application Package for not having to add another conformance class such as Define the Requirement for w param in DRU directly to make it easier to extent, cf. opengeospatial#386 Move workflow-not-found exception Requirement to DRU Requirements class

bpross-52n · 2024-01-08T14:23:29Z

SWG telecon from 8th January 2024: We would like to see this tested e.g. in a testbed before adding this to the standard.

bpross-52n · 2024-01-22T14:31:27Z

SWG meeting from January 22nd: Move this to Part 3 project.

fmigneault · 2024-01-22T15:03:58Z

@bpross-52n

SWG meeting from January 22nd: Move this to Part 3 project.

Sorry I could not assist to today's meeting due to a conflict.
Is it possible to get more details about this? The way Argo is being described here, each of its application are distinct processes. This fits more into Part 2 than Part 3. Part 3 could chain the resulting processes without any knowledge of Argo (or CWL for that matter).

jerstlouis · 2024-01-22T15:15:04Z

@fmigneault Part 3 includes a "Deployable workflows" requirement class allowing to deploy a workflow as a process using Part 2, as well as dedicated requirement classes for specific workflow definition languages.

Part 2 is about the generic idea that you can POST a process application package, regardless of what it contains.

But if the content of that package is a workflow, this is more about Part 3 (working in conjunction with Part 2).

We could also apply this to CWL, but due to the long-standing association with previous Part 2 efforts, Part 2 includes a CWL requirement class which is focused on the ability to use CWL for process description, rather than its ability to define workflows.

There is still a CWL workflow requirement class in Part 3 about defining a workflow using CWL.

fmigneault · 2024-01-22T16:00:46Z

@jerstlouis
I see.
I believe Argo would need to have a similar situation to CWL where it lies on both Part 2 and Part 3 simultaneously, since both can represent either a Workflow graph or a single application on their own.

jerstlouis · 2024-01-22T16:06:35Z

@fmigneault Sure, but that applies to all workflow definition languages, and I don't think that requires a req. class on Part 2 for that.

(I think that was even the case for CWL, but there were strong arguments in favor of including it)

fmigneault · 2024-01-22T16:33:07Z

From what I see on changed files, everything is relevant to Part 2, i.e.: a process description represented by Argo format and how to distinguish it from other workflow encodings to deploy/replace/undeploy it, to later execute it after deployment.

I looked quickly at Part 3 Deployable Workflows and my impression is that it attempts to duplicate what Part 2 does, but with fewer details about the deployment itself (which makes sense since Part 3 focuses more on Execution). Because of the execution endpoint being used, it generates some issues about conflicting {processId} locations that need to reserve CWL, OpenEO (and now another Argo as well, and any future workflow encoding...) due to definitions like:

IMO, it would make more sense that "Deployable Workflows" to be considered just another "workflow process graph" representation POSTed to /processes. Therefore, one could deploy a CWL graph, an Argo graph, an openEO graph, an "OGC workflow definitions defined as an execution request" (as described in deployable-workflows), etc.

The strength of Part 3 is about chaining multiple processes input/output/collections "on the fly" at execution time. If one intends to deploy the workflow rather than executing it directly, going through a Part 3 approach seems to over-complicate the Part 3 definition. Delegating "Deployable Workflows" to Part 2 with a specific "OGC Execution Workflow" would simplify how the two parts collaborate.

jerstlouis · 2024-01-22T16:42:16Z

I looked quickly at Part 3 Deployable Workflows and my impression is that it attempts to duplicate what Part 2 does,

The intent is not to duplicate anything, but to reference it normatively i.e., a workflow defined with Part 3, can be deployed using Part 2, for implementations declaring support for this requirement class, with a dependency on Part 2.

But you are right that currently Deployable Workflow is more about the "OGC workflow definitions defined as an execution request". But it could be broadened to be about workflow in any process graph definition language (CWL, openEO, Argo...).

The question really is just about where does the definition of that payload that get POSTed for definition languages belong.

Because they define workflows, I think the consensus was that it belongs to Part 3.

But of course the POST operation and the behavior is defined by Part 2.

In the end, it doesn't really matter in which document the req. classes are defined, as long as they can work together.

fmigneault · 2024-01-22T19:03:23Z

I think that because they define a workflow (which can be queried as described after deployment), and then be reused with other inputs without changing the process graph, it makes more sense to have them in Part 2. All the CWL, OpenEO and Argo graphs work under the assumption that the workflow steps are defined first, and then chains the submitted inputs.

The OGC Part 3 Workflow could be implemented using any of those representations, but its real power comes from bridging data/process sources into an execution pipeline that does not need deployment, at the cost of being provided inline each time in the execution request. This is what makes it distinct from Part 2. If a Part 3 workflow was deployed, it could then be called like any other atomic process, regardless of the workflow engine under it. The workflow definition would be abstracted away.

I am having discussions with other working groups, and the issue of handling multiple workflow formats and platform APIs often arises. I think it would be more useful for users if custom workflow encodings were deployed using Part 2 (as currently), while Part 3 limited itself to chaining standardized OGC API components. This way, Part 3 Workflows offer a truly interoperable way to call processes between servers. Otherwise, we somehow need to port OGC-native concepts such as collection I/O through CWL, OpenEO, etc. to use them with Part 3, and still remain stuck with platforms that cannot exchange those custom definitions.

jerstlouis · 2024-01-22T19:36:14Z

All the CWL, OpenEO and Argo graphs work under the assumption that the workflow steps are defined first, and then chains the submitted inputs.

The same is also true for the "Nested Processes" workflow defined in Part 3 an extension of Part 1 execution requests, they all work on existing OGC API - Processes either pre-existing for the implementation, or deployed using Part 2.

If a Part 3 workflow was deployed, it could then be called like any other atomic process, regardless of the workflow engine under it. The workflow definition would be abstracted away.

That is what the "Deployable Workflow" requirement class of Part 3 is about, leveraging part 2, f we make it agnostic of the workflow definition language (extended execution request, CWL, OpenEO, Argo...).

I think it would be more useful for users if custom workflow encodings were deployed using Part 2 (as currently), while Part 3 limited itself to chaining standardized OGC API components.

Whether things are defined in the Part 2 document or the Part 3 document should have zero impact on users. The functionality is exactly the same.

Otherwise, we somehow need to port OGC-native concepts such as collection I/O through CWL, OpenEO, etc. to use them with Part 3, and still remain stuck with platforms that cannot exchange those custom definitions.

Part 3 defines several things, which may be contributing to confusion.

"Collection Input" and "Collection Output" are really powerful concepts that bridges the data access OGC APIs as mechanisms, and is particularly relevant to the GeoDataCube API work. However, this "collection" functionality is fully orthogonal to the definition of process graphs in any particular workflow definition language, with the one exception that when using extended-Part 1 execution request, a "collection" property is used to specify a collection input.

What I mean here is that even if you used CWL or Argo for your workflow definition, there could be a specific mechanism for how one can accept an OGC API - Coverages collection as an input to the workflow definition (using Coverages as an example, but could be Features, Tiles, DGGS, Maps, EDR...). And similarly, you could support creating a virtual collection as per Part 3-Collection Output, and trigger execution of the workflow for an area/time/resolution of interset as a result of an OGC API - Coverages request ("Collection Output").

but its real power comes from bridging data/process sources into an execution pipeline that does not need deployment, at the cost of being provided inline each time in the execution request.

This cost is mitigated by either deploying the workflow using Part 2 ("Deployable Workflow"), or by setting up a virtual collection ("Collection Output", with the possibility to set up a persistent public-facing collection that can optionally expose its internal workflow).

I think that because they define a workflow (which can be queried as described after deployment), and then be reused with other inputs without changing the process graph, it makes more sense to have them in Part 2

Currently, I believe the SWG is working under the assumption that anything to do with "workflow" belongs to Part 3.

Of course Part 2 can be used to deploy both new processes that can be used within those workflows, and the workflows themselves as new processes (Part 3 - "Deployable Workflows"). The SWG could review whether more stuff should be included in Part 2, but I believe there is a preference to refrain from making too many chages to Part 2 so as to avoid delaying its completion.

fmigneault · 2024-01-22T23:27:54Z

That is what the "Deployable Workflow" requirement class of Part 3 is about, leveraging part 2, f we make it agnostic of the workflow definition language (extended execution request, CWL, OpenEO, Argo...).

Exactly my point, therefore there is no need for CWL, openEO, Argo requirements classes in Part 3. It is redundant to have them there, as they should already be handled by Part 2.

Whether things are defined in the Part 2 document or the Part 3 document should have zero impact on users. The functionality is exactly the same.

Since they are not POSTed on the same endpoint, do not expect the same payload, and the result is not the same (whether the workflow is simply deployed or is executed immediately), it matters a lot.

I agree with all points regarding how powerful Part 3 concepts could be, but at the same time, they lack explicit specification on how OGC concepts can be bridged with CWL, Argo, openEO, etc. There are already many long issue discussions not just by me that illustrate how non-trivial those assumptions do not just magically work together because each workflow technology has its own structure. Like you mentioned, Part 3 includes a lot of things. Adoption of these capabilities is only harder if we include Part 2 concepts in there as well. Since Part 3 already assumes that the processes it calls are Part 1 or Part 2 references, it makes more sense to reuse this abstraction.

Currently, I believe the SWG is working under the assumption that anything to do with "workflow" belongs to Part 3.

I think this is only a side effect of Part 3 being called "Workflow" when it defines way more than that. Workflow concepts were present since at least OGC Best Practice for Earth Observation Application Package, which following initiative participants decided to ignore for whatever reason...

bpross-52n · 2024-12-23T14:25:02Z

SWG meeting from 2024-12-23: The structure of the documents of part 2 have changed, so this PR would need to adapt to the new structure. @christophenoel could you update the PR. Note that the first version of part 2 is more or less final, so we would consider the argo conformance class as future work.

christophenoel added 7 commits December 20, 2023 16:07

Create requirements_class_argo

b790638

add argo req class

Create REC_body-argo.adoc

83ff208

added Argo

Rename requirements_class_argo to requirements_class_argo.adoc

b0349c0

Create clause_10_argo.adoc

e7b5b5c

added argo

Update 20-044.adoc

09038e8

added argo

Create REC_ogcapppkg_execution-unit-argo.adoc

c5ffa09

added argo

Create REC_body-argo.adoc

52f05ef

added argo

fmigneault reviewed Dec 21, 2023

View reviewed changes

gfenoy mentioned this pull request Jan 6, 2024

Fature/apply review fixes #388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Argo WF conformance class #386

Add Argo WF conformance class #386

christophenoel commented Dec 20, 2023

fmigneault left a comment

fmigneault Dec 21, 2023

fmigneault Dec 21, 2023

fmigneault Dec 21, 2023

fmigneault Dec 21, 2023

fmigneault Dec 21, 2023

fmigneault Dec 21, 2023

bpross-52n commented Jan 8, 2024

bpross-52n commented Jan 22, 2024

fmigneault commented Jan 22, 2024

jerstlouis commented Jan 22, 2024 •

edited

Loading

fmigneault commented Jan 22, 2024

jerstlouis commented Jan 22, 2024

fmigneault commented Jan 22, 2024

jerstlouis commented Jan 22, 2024 •

edited

Loading

fmigneault commented Jan 22, 2024

jerstlouis commented Jan 22, 2024

fmigneault commented Jan 22, 2024

bpross-52n commented Dec 23, 2024


		part:: If a process can be described for the intended use as a <<rc_argo,Argo graph>>, implementations should consider supporting the <<rc_argo,Argo>> encoding for describing the replacement process.

		part:: The media type `application/argo` shall be used to indicate that request body contains a processes description encoded as <<rc_ogcapppkg,Argo>>.


		part:: The value of the `type` property shall be `application/argo`, when for `mediaType` it should be `application/argo+json`.

		part:: The value of the `href` property shall be a reference to the Argo encoded file. The value of the `value` property shall be the Argo encoded in json format.


		part:: The value of the `href` property shall be a reference to the Argo encoded file. The value of the `value` property shall be the Argo encoded in json format.

		part:: If the Argo contains more than a single workflow identifier, an addition `w` query parameter may be used to target a specific workflow id to be deployed.


		part:: If the Argo contains more than a single workflow identifier, an addition `w` query parameter may be used to target a specific workflow id to be deployed.

		part:: The server should validate the Argo at the request time. In case, the server cannot find the `w` identifier within the workflow from the Argo provided, a 400 status code is expected with the type "argo-worflow-not-exist".

Add Argo WF conformance class #386

Are you sure you want to change the base?

Add Argo WF conformance class #386

Conversation

christophenoel commented Dec 20, 2023

fmigneault left a comment

Choose a reason for hiding this comment

fmigneault Dec 21, 2023

Choose a reason for hiding this comment

fmigneault Dec 21, 2023

Choose a reason for hiding this comment

fmigneault Dec 21, 2023

Choose a reason for hiding this comment

fmigneault Dec 21, 2023

Choose a reason for hiding this comment

fmigneault Dec 21, 2023

Choose a reason for hiding this comment

fmigneault Dec 21, 2023

Choose a reason for hiding this comment

bpross-52n commented Jan 8, 2024

bpross-52n commented Jan 22, 2024

fmigneault commented Jan 22, 2024

jerstlouis commented Jan 22, 2024 • edited Loading

fmigneault commented Jan 22, 2024

jerstlouis commented Jan 22, 2024

fmigneault commented Jan 22, 2024

jerstlouis commented Jan 22, 2024 • edited Loading

fmigneault commented Jan 22, 2024

jerstlouis commented Jan 22, 2024

fmigneault commented Jan 22, 2024

bpross-52n commented Dec 23, 2024

jerstlouis commented Jan 22, 2024 •

edited

Loading

jerstlouis commented Jan 22, 2024 •

edited

Loading