Skip to content

Commit

Permalink
Merge pull request #742 from crim-ca/job-management-execute
Browse files Browse the repository at this point in the history
  • Loading branch information
fmigneault authored Nov 13, 2024
2 parents c121f8d + 0f61ab2 commit d16ca30
Show file tree
Hide file tree
Showing 44 changed files with 2,916 additions and 622 deletions.
1 change: 1 addition & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ disable=C0111,missing-docstring,
R0902,too-many-instance-attributes,
R0904,too-many-public-methods,
R0912,too-many-branches,
R0913,too-many-arguments,
R0914,too-many-locals,
R0915,too-many-statements,
R0917,too-many-positional-arguments,
Expand Down
19 changes: 19 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,25 @@ Changes

Changes:
--------
- Add support of *OGC API - Processes - Part 4: Job Management* endpoints for `Job` creation and execution
(fixes `#716 <https://github.com/crim-ca/weaver/issues/716>`_).
- Add ``headers``, ``mode`` and ``response`` parameters along the ``inputs`` and ``outputs`` returned by
the ``GET /jobs/{jobID}/inputs`` endpoint to better describe the expected resolution strategy of the
multiple `Job` execution options according to submitted request parameters.
- Increase flexible auto-resolution of *synchronous* vs *asynchronous* `Job` execution when no explicit strategy
is specified by ``mode`` body parameter or ``Prefer`` header. Situations where such flexible resolution can occur
will be reflected by a ``mode: auto`` and the absence of ``wait``/``respond-async`` in the ``Prefer`` header
within the response of the ``GET /jobs/{jobID}/inputs`` endpoint.
- Add support "on-trigger" `Job` submission using the ``status: create`` request body parameter.
Such a `Job` will be pending, and can be modified by ``PATCH /jobs/{jobID}`` requests, until execution is triggered
by a subsequent ``POST /jobs/{jobID}/results`` request.
- Align ``GET /jobs/{jobID}/outputs`` with requirements of *OGC API - Processes - Part 4: Job Management* endpoints
such that omitting the ``schema`` query parameter will automatically apply the `OGC` mapping representation by
default. Previous behavior was to return whichever representation that was used by the internal `Process` interface.
- Align `Job` status and update operations with some of the `openEO` behaviors, such as supporting a `Job` ``title``
and allowing ``status`` to return `openEO` values when using ``profile=openeo`` in the ``Content-Type`` or using
the query parameter ``profile``/``schema``. The ``Content-Schema`` will also reflect the resolved representation
in the `Job` status response.
- Add support of ``response: raw`` execution request body parameter as alternative to ``response: document``,
which allows directly returning the result contents or ``Link`` headers rather then embedding them in a `JSON`
response (fixes `#376 <https://github.com/crim-ca/weaver/issues/376>`_).
Expand Down
41 changes: 41 additions & 0 deletions docs/examples/job_status_ogcapi.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"$schema": "https://schemas.opengis.net/ogcapi/processes/part1/1.0/openapi/schemas/statusInfo.yaml",
"jobID": "a305ef3e-3220-4d43-b1be-301f5ef13c23",
"processID": "example-process",
"providerID": null,
"type": "process",
"status": "succeeded",
"message": "Job succeeded.",
"created": "2024-10-02T14:21:12.380000+00:00",
"started": "2024-10-02T14:21:12.990000+00:00",
"finished": "2024-10-02T14:21:23.629000+00:00",
"updated": "2024-10-02T14:21:23.630000+00:00",
"duration": "00:00:10",
"runningDuration": "PT11S",
"runningSeconds": 10.639,
"percentCompleted": 100,
"progress": 100,
"links": [
{
"title": "Job status.",
"hreflang": "en-CA",
"href": "https://example.com/processes/download-band-sentinel2-product-safe/jobs/a305ef3e-3220-4d43-b1be-301f5ef13c23",
"type": "application/json",
"rel": "status"
},
{
"title": "Job monitoring location.",
"hreflang": "en-CA",
"href": "https://example.com/processes/download-band-sentinel2-product-safe/jobs/a305ef3e-3220-4d43-b1be-301f5ef13c23",
"type": "application/json",
"rel": "monitor"
},
{
"title": "Job results of successful process execution (direct output values mapping).",
"hreflang": "en-CA",
"href": "https://example.com/processes/download-band-sentinel2-product-safe/jobs/a305ef3e-3220-4d43-b1be-301f5ef13c23/results",
"type": "application/json",
"rel": "http://www.opengis.net/def/rel/ogc/1.0/results"
}
]
}
29 changes: 29 additions & 0 deletions docs/examples/job_status_wps.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse
xmlns:wps="http://www.opengis.net/wps/1.0.0"
xmlns:ows="http://www.opengis.net/ows/1.1"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd"
service="WPS"
version="1.0.0"
xml:lang="en-CA"
serviceInstance="https:/example.com/wps?request=GetCapabilities&amp;amp;service=WPS"
statusLocation="https:/example.com/wpsoutputs/fd6e2171-39a7-4f99-b639-c43b51f15b56.xml">
<wps:Process wps:processVersion="1.1.0">
<ows:Identifier>example-process</ows:Identifier>
</wps:Process>
<wps:Status creationTime="2024-09-19T14:29:49Z">
<wps:ProcessSucceeded>Package operations complete.</wps:ProcessSucceeded>
</wps:Status>
<wps:ProcessOutputs>
<wps:Output>
<ows:Identifier>output</ows:Identifier>
<ows:Title>output</ows:Title>
<wps:Reference
href="https:/example.com/wpsoutputs/fd6e2171-39a7-4f99-b639-c43b51f15b56/output/data.json"
mimeType="application/json" encoding="" schema=""
/>
</wps:Output>
</wps:ProcessOutputs>
</wps:ExecuteResponse>
4 changes: 2 additions & 2 deletions docs/source/package.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ When the above code is saved in a |jupyter-notebook|_ and committed to a Git rep
utility can automatically clone the repository, parse the Python code, extract the :term:`CWL` annotations, and
generate the :term:`Application Package` with a :term:`Docker` container containing all of their respective definitions.
All of this is accomplished with a single call to obtain a deployable :term:`CWL` in `Weaver`, which can then take over
from the :ref:`Process Deployment <proc_op_deploy>` to obtain an :term:`OGC API - Process` definition.
from the :ref:`Process Deployment <proc_op_deploy>` to obtain an :term:`OGC API - Processes` definition.

Jupyter Notebook to CWL Example: NCML to STAC Application
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -1408,7 +1408,7 @@ However, the :term:`Vault` approach as potential drawbacks.

.. note::
For more details about the :term:`Vault`, refer to sections :ref:`file_vault_inputs`, :ref:`vault_upload`,
and the corresponding capabilities in :term:`cli_example_upload`.
and the corresponding capabilities in :ref:`cli_example_upload`.

.. _app_pkg_secret_cwltool:

Expand Down
177 changes: 160 additions & 17 deletions docs/source/processes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,11 @@ Execution of a Process (Execute)
For backward compatibility, the |exec-req-job|_ request is also supported as alias to the above
:term:`OGC API - Processes` compliant endpoint.

.. seealso::
Alternatively, the |job-exec-req|_ request can also be used to submit a :term:`Job` for later execution,
as well as enabling other advanced :ref:`proc_job_management` capabilities.
See :ref:`proc_op_job_create` for more details.

This section will first describe the basics of this request format (:ref:`proc_exec_body`), and after go into
further details for specific use cases and parametrization of various input/output combinations
(:ref:`proc_exec_mode`, :ref:`proc_exec_results`, etc.).
Expand Down Expand Up @@ -1694,7 +1699,7 @@ the ``POST /search`` or the ``POST /collections/dataset-features/search`` could

Alternatively, if an array of ``image/tiff; application=geotiff`` was expected by the :term:`Process` while targeting
the ``collection`` on a :term:`STAC` server, the |stac-assets|_ matching the requested :term:`Media-Types` could
potentially be retrieved as input for the :term:`Process Execution <proc_op_execute>`.
potentially be retrieved as input for the :ref:`Process Execution <proc_op_execute>`.

In summary, the |ogc-api-proc-part3-collection-input|_ offers a lot of flexibility with its resolution compared to
the typical :ref:`Input Types <cwl-io-types>` (i.e.: ``Literal``, ``BoundingBox``, ``Complex``) that must be explicitly
Expand Down Expand Up @@ -1997,17 +2002,93 @@ of the polling-based method on the :ref:`Job Status <proc_op_status>` endpoint o
.. seealso::
Refer to the |oas-rtd|_ of the |exec-req|_ request for all available ``subscribers`` properties.

.. _proc_job_management:

Job Management
==================================================

This section presents capabilities related to :term:`Job` management.
The endpoints and related operations are defined in a mixture of |ogc-api-proc|_ *Core* requirements,
some official extensions, and further `Weaver`-specific capabilities.

.. seealso::
- |ogc-api-proc-part1-spec-html|_
- |ogc-api-proc-part4|_

.. _proc_op_job_create:

Submitting a Job Creation
---------------------------------------------------------------------

.. important::
All concepts introduced in the :ref:`Execution of a Process <proc_op_execute>` also apply in this section.
Consider reading the subsections for more specific details.

This section will only cover *additional* concepts and parameters applicable only for this feature.

Rather than using the |exec-req|_ request, the |job-exec-req|_ request can be used to submit a :term:`Job`.
When doing so, all parameters typically required for :term:`Process` execution must also be provided, including
any relevant :ref:`proc_exec_body` contents (:term:`I/O`), the desired :ref:`proc_exec_mode`, and
the :ref:`proc_exec_results` options. However, an *additional* ``process`` :term:`URL` in the request body is required,
to indicate which :term:`Process` should be executed by the :term:`Job`.

The |job-exec-req|_ operation allows interoperability alignement with other execution strategies, such as defined
by the |openeo-api|_ and the |ogc-tb20-gdc|_ *GDC API Profile*. It also opens the door for advanced :term:`Workflow`
definitions from a common :term:`Job` endpoint interface, as described by the |ogc-api-proc-part4|_ extension.

Furthermore, an optional ``"status": "create"`` request body parameter can be supplied to indicate to the :term:`Job`
that it should remain in *pending* state, until a later :ref:`Job Execution Trigger <proc_op_job_trigger>` is performed
to start its execution. This allows the user to apply any desired :ref:`Job Updates <proc_op_job_update>` or reviewing
the resolved :ref:`proc_op_job_inputs` prior to submitting the :term:`Job`. This acts in contrast to
the *Core* |exec-req|_ operation that *immediately* places the :term:`Job` in queue, locking it from any update.

.. _proc_op_job_update:

Updating a Job
---------------------------------------------------------------------

The |job-update-req|_ request allows updating the :term:`Job` and its underlying parameters prior to execution.
For this reason, it has for pre-requirement to be in ``created`` :ref:`Job Status <proc_op_job_status>`, such that
it is pending a :ref:`Job Execution Trigger <proc_op_job_trigger>` before being sent to the worker execution queue.
For any other ``status`` than ``created``, attempts to modify the :term:`Job` will return an *HTTP 423 Locked* error
response.

Potential parameters that can be updated are:

- Submitted :term:`Process` ``inputs``
- Desired ``outputs`` formats and representations, as per :ref:`proc_exec_results`
- Applicable ``headers``, ``response`` and ``mode`` options as per :ref:`proc_exec_mode`
- Additional metadata such as a custom :term:`Job` ``title``

After updating the :term:`Job`, the :ref:`Job Status <proc_op_job_status>` and :ref:`Job Inputs <proc_op_job_inputs>`
operations can further be performed to review the *pending* :term:`Job` state. Using all those operations allows the
user to iteratively adjust the :term:`Job` until it is ready for execution, for which
the :ref:`Job Execution Trigger <proc_op_job_trigger>` would then be employed.

.. _proc_op_job_trigger:

Triggering Job Execution
---------------------------------------------------------------------

The |job-trigger-req|_ request allows submitting a *pending* :term:`Job` to the worker execution queue. Once performed,
the typical :ref:`proc_op_monitor` operation can be employed, until eventual success or failure of the :term:`Job`.

If the :term:`Job` was already submitted, is already in queue, is actively running, or already finished execution,
this operation will return a *HTTP 423 Locked* error response.

.. _proc_op_job_status:
.. _proc_op_status:
.. _proc_op_monitor:

Monitoring of a Process Execution (GetStatus)
Monitoring a Job Execution (GetStatus)
---------------------------------------------------------------------

Monitoring the execution of a :term:`Job` consists of polling the status ``Location`` provided from the
:ref:`Execute <proc_op_execute>` operation and verifying the indicated ``status`` for the expected result.
The ``status`` can correspond to any of the value defined by :data:`weaver.status.JOB_STATUS_VALUES`
accordingly to the internal state of the workers processing their execution.
:ref:`Execute <proc_op_execute>` or :ref:`Trigger <proc_op_job_trigger>` operation and verifying the
indicated ``status`` for the expected result.
The ``status`` can correspond to any of the value defined by :class:`weaver.status.Status`
accordingly to the internal state of the workers processing their execution, and the
negotiated :ref:`proc_op_job_status_alt` representation.

When targeting a :term:`Job` submitted to a `Weaver` instance, monitoring is usually accomplished through
the :term:`OGC API - Processes` endpoint using |status-req|_, which will return a :term:`JSON` body.
Expand All @@ -2029,21 +2110,58 @@ format is employed according to the chosen location.
- Location
* - :term:`OGC API - Processes`
- :term:`JSON`
- ``{WEAVER_URL}/jobs/{JobUUID}``
- ``{WEAVER_URL}/jobs/{jobID}``
* - :term:`WPS`
- :term:`XML`
- ``{WEAVER_WPS_OUTPUTS}/{JobUUID}.xml``
- ``{WEAVER_WPS_OUTPUTS}/{jobID}.xml``

.. seealso::
For the :term:`WPS` endpoint, refer to :ref:`conf_settings`.

.. fixme: add example
.. fixme: describe minimum fields and extra fields
Following are examples for both representations. Note that results might vary according to other parameters such
as when using :ref:`proc_op_job_status_alt`, or when different :term:`Process` references or :term:`Workflow`
definitions are involved.

.. literalinclude:: ../examples/job_status_ogcapi.json
:language: json
:caption: :term:`Job` Status in :term:`JSON` using the :term:`OGC API - Processes` interface
:name: job-status-ogcapi

.. literalinclude:: ../examples/job_status_wps.xml
:language: xml
:caption: :term:`Job` Status in :term:`XML` using the :term:`WPS` interface
:name: job-status-wps

.. _proc_op_job_status_alt:

Alternate Job Status
~~~~~~~~~~~~~~~~~~~~

In order to support alternate :term:`Job` status representations, the following approaches can be used when performing
the |status-req|_ request.

- Specify either a ``profile`` or ``schema`` query parameter (e.g.: ``/jobs/{jobID}?profile=openeo``).
- Specify a ``profile`` parameter within the ``Accept`` header (e.g.: ``Accept: application/json; profile=openeo``).

Using the |openeo|_ profile for example, will allow returning ``status`` values that are appropriate
as per the |openeo-api|_ definition.

When performing :ref:`Job Status <proc_op_job_status>` requests, the received response should
contain a ``Content-Schema`` header indicating which of the applied ``profile`` is being represented.
This header is employed because multiple ``Content-Type: application/json`` headers are applicable
across multiple :term:`API` implementations and status representations.

.. _proc_op_result:
.. _proc_op_job_detail:

Obtaining Job Outputs, Results, Logs or Errors
---------------------------------------------------------------------
Obtaining Job Details and Metadata
----------------------------------

All endpoints to retrieve any of the following information about a :term:`Job` can either be requested directly
(i.e.: ``/jobs/{jobID}/...``) or with equivalent :term:`Provider` and/or :term:`Process` prefixed endpoints,
if the requested :term:`Job` did refer to those :term:`Provider` and/or :term:`Process`.
A *local* :term:`Process` would have its :term:`Job` references as ``/processes/{processId}/jobs/{jobID}/...``
while a :ref:`proc_remote_provider` will use ``/provider/{providerName}/processes/{processId}/jobs/{jobID}/...``.

.. _proc_op_job_outputs:

Expand Down Expand Up @@ -2219,7 +2337,8 @@ Job Inputs
In order to better understand the parameters that were *originally* submitted during :term:`Job` creation,
the |inputs-req|_ can be employed. This will return both the data and reference ``inputs`` that were submitted,
as well as the *requested* ``outputs`` [#outN]_ to retrieve any relevant ``transmissionMode``, ``format``, etc.
parameters that where specified during submission of the :ref:`proc_exec_body`.
parameters that where specified during submission of the :ref:`proc_exec_body`, and any other relevant ``headers``
that can affect the :ref:`proc_exec_mode` and :ref:`proc_exec_results`.
For convenience, this endpoint also returns relevant ``links`` applicable for the requested :term:`Job`.

.. literalinclude:: ../examples/job_inputs.json
Expand All @@ -2230,6 +2349,9 @@ For convenience, this endpoint also returns relevant ``links`` applicable for th
.. note::
The ``links`` presented above are not an exhaustive list to keep the example relatively small.

If the :term:`Job` is still pending execution, the parameters returned by this endpoint can be modified
using the :ref:`proc_op_job_update` operation before submitting it.

.. _proc_op_job_error:
.. _proc_op_job_exceptions:

Expand Down Expand Up @@ -2278,12 +2400,33 @@ Note again that the more the :term:`Process` is verbose, the more tracking will
:caption: Example :term:`JSON` Representation of :term:`Job` Logs Response
:name: job-logs

.. _proc_op_job_prov:

Job Provenance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. fixme: CWL and Job Prov (https://github.com/crim-ca/weaver/issues/673)
.. todo::
implement ``GET /jobs/{jobID}/run`` and/or ``GET /jobs/{jobID}/prov``
(see https://github.com/crim-ca/weaver/issues/673)

.. _proc_op_job_stats:

Job Statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. note::
All endpoints to retrieve any of the above information about a :term:`Job` can either be requested directly
(i.e.: ``/jobs/{jobID}/...``) or with equivalent :term:`Provider` and/or :term:`Process` prefixed endpoints,
if the requested :term:`Job` did refer to those :term:`Provider` and/or :term:`Process`.
A *local* :term:`Process` would have its :term:`Job` references as ``/processes/{processId}/jobs/{jobID}/...``
while a :ref:`proc_remote_provider` will use ``/provider/{providerName}/processes/{processId}/jobs/{jobID}/...``.
This feature is specific to `Weaver`.

The |job-stats-req|_ request can be performed to obtain runtime statistics from the :term:`Job`.
This content is only available when a :term:`Job` has successfully completed.
Below is a sample of possible response. Some parts might be omitted according to the
internal :term:`Application Package` of the :term:`Process` represented by the :term:`Job` execution.

.. literalinclude:: ../../weaver/wps_restapi/examples/job_statistics.json
:language: json
:caption: Example :term:`JSON` of :term:`Job` Statistics Response
:name: job-statistics

.. _vault_upload:

Expand Down
Loading

0 comments on commit d16ca30

Please sign in to comment.