Skip to content

Commit

Permalink
Merge pull request #778 from crim-ca/cwl-prov
Browse files Browse the repository at this point in the history
  • Loading branch information
fmigneault authored Dec 19, 2024
2 parents 72d15c4 + e41cd24 commit b4e82a0
Show file tree
Hide file tree
Showing 34 changed files with 2,987 additions and 135 deletions.
30 changes: 26 additions & 4 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,32 @@ Changes:
--------
- Add support of Python 3.13.
- Drop support of Python 3.8.

Fixes:
------
- No change.
- Add support of *OGC API - Processes - Part 4: Job Management* related to ``PROV`` requirement and conformance classes.
- Add support of `W3C PROV <https://www.w3.org/TR/prov-overview/>`_ to provide ``GET /jobs/{jobId}/prov`` endpoints
and all underlying paths (``/info``, ``/who``, ``/run``, ``/inputs``, ``/outputs``, and ``../{runId}`` variants)
to retrieve provenance metadata from a `Job` execution and its corresponding `Process` and `Workflow` definitions,
as processed by ``cwltool``/``cwlprov`` and extended by `Weaver`-specific server metadata.
Supported ``PROV`` representations are ``PROV-N``, ``PROV-NT``, ``PROV-JSON``, ``PROV-JSONLD``, ``PROV-XML``
and ``PROV-TURTLE``, each of which can be obtained by providing the corresponding ``Accept`` headers.
- Add ``weaver.cwl_prov`` configuration option to control the new ``PROV`` metadata collection feature.
- Add ``prov`` and ``provenance`` CLI and ``WeaverClient`` operations.
- Extend ``weaver.cli.WeaverArgumentParser`` "*rules*" to allow returning an error message providing better
case-by-case details about the specific cause of failure handled by the *rule* callable.
- Update certain ``cornice`` service definitions that were using "``prov``" as referencing to `Providers` to avoid
confusion with the multiple ``PROV``/`Provenance` related terminology and services added for the new feature.
- Pin ``cwltool==3.1.20241217163858`` to employ the official release including
``PROV`` configuration provided to easily configured `Weaver`
(relates to `common-workflow-language/cwltool#2082 <https://github.com/common-workflow-language/cwltool/pull/2082>_)
and integrate previously provided fixes
(relates to `common-workflow-language/cwltool#2082 <https://github.com/common-workflow-language/cwltool/pull/2036>_)
that were applied by a forked backport ``https://github.com/fmigneault/cwltool`` repository.

Fixes:
------
- Fix missing documentation about certain ``WeaverClient`` operations.
- Fix ``weaver.cli.OperationResult`` not setting its ``text`` property when a valid non-`JSON` response is obtained.
- Fix the `API` frontpage `HTML` rendering to returning enabled features and corresponding ``doc``/``url``/``api``
endpoints for quick referencing the capabilities activated for a `Weaver` instance.

.. _changes_6.0.0:

Expand Down
7 changes: 6 additions & 1 deletion config/weaver.ini.example
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,11 @@ weaver.cwl_egid =
weaver.cwl_processes_dir =
weaver.cwl_processes_register_error = false

# provenance functionality
# if disabled, provenance details will not be collected when running Application Packages and Workflows
# if disabled, the '/jobs/{jobId}/prov' endpoint will always report missing information since unavailable
weaver.cwl_prov = true

# --- Weaver WPS settings ---
weaver.wps = true
weaver.wps_url =
Expand Down Expand Up @@ -128,7 +133,7 @@ weaver.wps_metadata_identification_keywords=Weaver,WPS,OGC
# access constraints can be comma-separated
weaver.wps_metadata_identification_accessconstraints=NONE
weaver.wps_metadata_identification_fees=NONE
weaver.wps_metadata_provider_name=CRIM
weaver.wps_metadata_provider_name=Computer Research Institute of Montréal (CRIM)
weaver.wps_metadata_provider_url=http://pavics-weaver.readthedocs.org/en/latest/
weaver.wps_metadata_contact_name=Francis Charette-Migneault
weaver.wps_metadata_contact_position=Research Software Developer
Expand Down
12 changes: 12 additions & 0 deletions docs/source/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,15 @@ Glossary
Entity that describes the required inputs, produced outputs, and any applicable metadata for the execution of
the defined script, calculation, or operation.

PROV
Provenance
Metadata using the :term:`W3C` |PROV|_ standard that is applied to a submitted :term:`Job` execution to allow
retrieving its origin, the related :term:`Application Package`, its :term:`I/O` sources and results, as well as
additional details about the server host and runtime user as applicable to replicate the experiment.

.. seealso::
:ref:`proc_op_job_prov`

Provider
Entity that offers an ensemble of :term:`Process` under it. It is typically a reference to a remote service,
where any :term:`Process` it provides is fetched dynamically on demand.
Expand Down Expand Up @@ -331,6 +340,9 @@ Glossary
Since |ogc-api-standards|_ are based on HTTP and web communications, this consortium establishes the
common foundation definitions used by the :term:`API` specifications.

.. seealso::
|w3c|_

WKT
Well-Known Text geometry representation.

Expand Down
70 changes: 69 additions & 1 deletion docs/source/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,29 @@ Python Client Commands
For details about using the Python :py:class:`weaver.cli.WeaverClient`, please refer directly to its class
documentation and its underlying methods.

* :py:meth:`weaver.cli.WeaverClient.info`
* :py:meth:`weaver.cli.WeaverClient.version`
* :py:meth:`weaver.cli.WeaverClient.conformance`
* :py:meth:`weaver.cli.WeaverClient.register`
* :py:meth:`weaver.cli.WeaverClient.unregister`
* :py:meth:`weaver.cli.WeaverClient.deploy`
* :py:meth:`weaver.cli.WeaverClient.undeploy`
* :py:meth:`weaver.cli.WeaverClient.capabilities`
* :py:meth:`weaver.cli.WeaverClient.describe`
* :py:meth:`weaver.cli.WeaverClient.package`
* :py:meth:`weaver.cli.WeaverClient.jobs`
* :py:meth:`weaver.cli.WeaverClient.trigger_job`
* :py:meth:`weaver.cli.WeaverClient.update_job`
* :py:meth:`weaver.cli.WeaverClient.execute`
* :py:meth:`weaver.cli.WeaverClient.monitor`
* :py:meth:`weaver.cli.WeaverClient.dismiss`
* :py:meth:`weaver.cli.WeaverClient.status`
* :py:meth:`weaver.cli.WeaverClient.inputs`
* :py:meth:`weaver.cli.WeaverClient.outputs`
* :py:meth:`weaver.cli.WeaverClient.logs`
* :py:meth:`weaver.cli.WeaverClient.statistics`
* :py:meth:`weaver.cli.WeaverClient.exceptions`
* :py:meth:`weaver.cli.WeaverClient.provenance`
* :py:meth:`weaver.cli.WeaverClient.dismiss`
* :py:meth:`weaver.cli.WeaverClient.results`
* :py:meth:`weaver.cli.WeaverClient.upload`

Expand Down Expand Up @@ -479,6 +494,59 @@ Sample Output:
.. literalinclude:: ../../weaver/wps_restapi/examples/job_results.json
:language: json

.. _cli_example_job_prov:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Job Provenance Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Accomplishes the :term:`Job` |PROV|_ request to obtain :term:`Provenance` metadata.

Below examples employ the ``Echo`` :term:`Process` available in |weaver-func-test-apps|_
and assume the referenced :term:`Job` was completed successfully.

.. note::
There fore multiple alternative format representations offered by this operation.
Not all of them are presented below. See the various ``prov_type`` and ``prov_format``
parameters for the combinations.

.. seealso::
- :ref:`proc_op_job_prov` provides more details about available endpoints, operations and metadata returned.

.. code-block:: shell
:caption: Command Line
weaver prov -u ${WEAVER_URL} -j "1c49f085-bbd7-410d-a801-81fd42469e8a" --pT run
.. code-block:: python
:caption: Python
from weaver.provenance import ProvenancePathType
client.prov("1c49f085-bbd7-410d-a801-81fd42469e8a", prov_type=ProvenancePathType.PROV_RUN)
Sample Output:

.. literalinclude:: ../../weaver/wps_restapi/examples/job_prov_run.txt
:language: text

.. code-block:: shell
:caption: Command Line
weaver prov -u ${WEAVER_URL} -nL --pF "PROV-JSON"
.. code-block:: python
:caption: Python
from weaver.provenance import ProvenanceFormat
client.prov("1c49f085-bbd7-410d-a801-81fd42469e8a", prov_format=ProvenanceFormat.PROV_N)
Sample Output:

.. literalinclude:: ../../weaver/wps_restapi/examples/job_prov.txt
:language: text

.. _cli_example_upload:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
17 changes: 17 additions & 0 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,23 @@ they are optional and which default value or operation is applied in each situat
.. versionadded:: 1.9

.. _weaver-cwl-prov:

- | ``weaver.cwl_prov = true|false`` [:class:`bool`-like]
| (default: ``true``)
|
| Configure whether :term:`W3C` |PROV|_ functionality using the :ref:`proc_op_job_prov` endpoints should be enabled
to collect :term:`Provenance` metadata when executing the underlying :term:`CWL` of a given :term:`Process`
or :term:`Workflow`.
.. note::

Any pre-existing :term:`Job` that was created when this option did not yet exist or that was executed while
it was disabled will not offer :term:`Provenance` metadata. This is intrinsic to the functionality that must obtain
timely metadata *while* executing to properly represent operational steps and :term:`Job` updates as they occur.

.. versionadded:: 6.1

.. _weaver-wps:

- | ``weaver.wps = true|false`` [:class:`bool`-like]
Expand Down
124 changes: 116 additions & 8 deletions docs/source/processes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ through some parsing (e.g.: :ref:`proc_wps_12`) or with some requirement indicat
special handling. The represented :term:`Process` is aligned with |ogc-api-proc|_ specifications.

When deploying one such :term:`Process` directly, it is expected to have a definition specified
with a :term:`CWL` `Application Package`_, which provides resources about one of the described :ref:`app_pkg_types`.
with a :term:`CWL` :ref:`application-package`, which provides resources about one of the described :ref:`app_pkg_types`.

This is most of the time employed to wrap operations packaged in a reference :term:`Docker` image, but it can also
wrap :ref:`app_pkg_remote` to be executed on another server (i.e.: :term:`ADES`). When the :term:`Process` should be
Expand Down Expand Up @@ -490,6 +490,8 @@ the |getcap-req|_ request.
Modify an Existing Process (Update, Replace, Undeploy)
-----------------------------------------------------------------------------

.. versionadded:: 4.20

Since `Weaver` supports |ogc-api-proc-part2|_, it is able to remove a previously registered :term:`Process` using
the :ref:`Deployment <proc_op_deploy>` request. The undeploy operation consist of a ``DELETE`` request targeting the
specific ``{WEAVER_URL}/processes/{processID}`` to be removed.
Expand All @@ -498,8 +500,6 @@ specific ``{WEAVER_URL}/processes/{processID}`` to be removed.
The :term:`Process` must be accessible by the user considering any visibility configuration to perform this step.
See :ref:`proc_op_deploy` section for details.

.. versionadded:: 4.20

Starting from version `4.20 <https://github.com/crim-ca/weaver/tree/4.20.0>`_, a :term:`Process` can be replaced or
updated using respectively the ``PUT`` and ``PATCH`` requests onto the specific ``{WEAVER_URL}/processes/{processID}``
location of the reference to modify.
Expand Down Expand Up @@ -1989,7 +1989,7 @@ the configured :term:`WPS` output directory.
Header ``X-WPS-Output-Context`` is ignored when using `S3` buckets for output location since they are stored
individually per :term:`Job` UUID, and hold no relevant *context* location. See also :ref:`conf_s3_buckets`.

.. versionadded:: 4.3
.. versionchanged:: 4.3
Addition of the ``X-WPS-Output-Context`` header.

.. _proc_op_execute_subscribers:
Expand Down Expand Up @@ -2419,10 +2419,118 @@ Note again that the more the :term:`Process` is verbose, the more tracking will
Job Provenance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. fixme: CWL and Job Prov (https://github.com/crim-ca/weaver/issues/673)
.. todo::
implement ``GET /jobs/{jobID}/run`` and/or ``GET /jobs/{jobID}/prov``
(see https://github.com/crim-ca/weaver/issues/673)
.. versionadded:: 6.1

The provenance endpoints allow to obtain :term:`W3C` |PROV|_ metadata from a successfully completed :term:`Job`
using various representations. This provenance information can help identify traceability information such as the input
data sources, validate output checksums, and understand all internal :term:`Process` data transformations that were
involved within an executed :term:`Workflow`.

The |PROV|_ metadata consists of information records about entities, activities, and people involved in producing a
piece of data or thing |PROV-dfn|_, which can be used to form assessments about its quality, reliability or
trustworthiness.

.. |PROV-dfn| replace:: :sup:`[^]`
.. _PROV-dfn: https://www.w3.org/TR/2013/REC-prov-dm-20130430/#dfn-provenance

.. seealso::
- |PROV-overview|_
- |cwltool-cwlprov|_

.. figure:: https://www.w3.org/TR/2013/REC-prov-o-20130430/diagrams/starting-points.svg
:alt: PROV-O Resources
:target: `PROV-O`_
:align: center
:width: 500px

Provenance Resource Relationships [|PROV-O|_]


The provenance endpoints are provided in alignment with the |ogc-api-proc-part4|_ provenance class requirement.
However, `Weaver` also provides additional functionalities in comparison to the minimal requirements from the
:term:`OGC` specification.

Following is a table of available formats and corresponding endpoints offered by `Weaver`.

.. list-table:: Job Provenance Endpoints
:name: table-job-prov
:align: center
:header-rows: 1
:widths: 25,10,20,45

* - Endpoint
- |PROV|_ Format
- :term:`Media-Type`
- Description
* - ``/jobs/{jobID}/prov``
- |PROV-JSON|_
- ``application/json``
- :term:`Provenance` metadata using :term:`JSON` representation.
* - ``/jobs/{jobID}/prov``
- |PROV-JSONLD|_
- ``application/ld+json``
- :term:`Provenance` metadata using |JSON-LD|_ representation.
* - ``/jobs/{jobID}/prov``
- |PROV-XML|_
- ``text/xml`` or ``application/xml``
- :term:`Provenance` metadata using :term:`XML` representation.
* - ``/jobs/{jobID}/prov``
- |PROV-N|_
- ``text/provenance-notation``
- :term:`Provenance` metadata using the main |PROV|_ notation representation.
* - ``/jobs/{jobID}/prov``
- PROV-NT
- ``application/n-triples``
- :term:`Provenance` metadata using |rdf-n-triples|_ (NT) representation.
* - ``/jobs/{jobID}/prov``
- PROV-TURTLE
- ``text/turtle``
- :term:`Provenance` metadata using |rdf-turtle|_ (TTL) representation.
* - ``/jobs/{jobID}/prov/info``
- |na|
- ``text/plain``
- Metadata about the *Research Object* packaging information.
* - ``/jobs/{jobID}/prov/who``
- |na|
- ``text/plain``
- Metadata of who ran the :term:`Job`.
* - ``/jobs/{jobID}/prov/runs``
- |na|
- ``text/plain``
- Obtain the list of ``runID`` steps of the :term:`Workflow` within the :term:`Job`.
* - ``/jobs/{jobID}/prov/run``
- |na|
- ``text/plain``
- Metadata of the main :term:`Job` and any nested step runs in the case of a :term:`Workflow`.
* - ``/jobs/{jobID}/prov/inputs``
- |na|
- ``text/plain``
- Metadata about the :term:`Job` input IDs.
* - ``/jobs/{jobID}/prov/outputs``
- |na|
- ``text/plain``
- Metadata about the :term:`Job` output IDs.
* - ``/jobs/{jobID}/prov/[run|inputs|outputs]/{runID}``
- |na|
- ``text/plain``
- Same as their respective definitions above, but for a specific step of a :term:`Workflow`.

.. seealso::
This feature is enabled by default. Its functionality and the corresponding :term:`API` endpoints
can be controlled using :ref:`Configuration Option <weaver-cwl-prov>` ``weaver.cwl_prov``.

Resulting metadata that is collected from :term:`Job` :term:`Provenance` will be stored under a similar endpoint
as the :ref:`exec_output_location`, except with an additional ``-prov`` suffix applied after the :term:`Job` UUID,
as shown below.
This location is selected to conveniently offer the ``PROV`` metadata with a different parent directory than
the :term:`Job` outputs, therefore allowing different endpoint access control schemes between the ``PROV`` metadata
and actual output data, while also reusing the configured :ref:`exec_output_location` that can be used to quickly
serve :term:`Provenance` contents without any additional configuration.

.. code-block::
{WPS_OUTPUT_URL}[/{WPS_OUTPUT_CONTEXT}]/{JOB_UUID}-prov
.. _proc_op_job_stats:

Expand Down
Loading

0 comments on commit b4e82a0

Please sign in to comment.