Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CWL PROV support #778

Merged
merged 28 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
2a5a3ee
[wip] add CWL PROV support - basic only (no extra weaver-specific tra…
fmigneault Dec 6, 2024
28af8a5
adjust PROV for potential metadata updates - works for workflow, but …
fmigneault Dec 7, 2024
f15af39
fix job path resolution
fmigneault Dec 9, 2024
34fe955
add all path variations of CWLProv operations (relates to https://git…
fmigneault Dec 11, 2024
8979618
return job links within prov responses
fmigneault Dec 11, 2024
851c1f1
add conformance and requirements classes for PROV of part 4 job manag…
fmigneault Dec 11, 2024
514e7e3
pin versions for cwlprov
fmigneault Dec 11, 2024
320d485
functional PROV with extra Weaver metadata
fmigneault Dec 12, 2024
fbdc237
refactor PROV definitions into separate module + add WeaverClient/CLI…
fmigneault Dec 13, 2024
450da18
fix linting
fmigneault Dec 13, 2024
c995ea2
disable security check for non-security related use of SHA1
fmigneault Dec 14, 2024
1c6c7a8
use proper fix for non-security use of SHA1
fmigneault Dec 14, 2024
aab2a85
Merge branch 'master' into cwl-prov
fmigneault Dec 14, 2024
a09d7af
update PROV docs
fmigneault Dec 14, 2024
d3c1c52
allow PROV endpoints to be disabled + fixes to parsing PROV format + …
fmigneault Dec 14, 2024
a3a8314
make PROV image source more obvious
fmigneault Dec 14, 2024
ab2c917
fix test
fmigneault Dec 14, 2024
2546123
fix backward compat Python 3.9 for simultanous classmethod and classp…
fmigneault Dec 14, 2024
0aea60b
Merge remote-tracking branch 'origin/master' into cwl-prov
fmigneault Dec 18, 2024
31e0969
update changelogs and docs about prov features
fmigneault Dec 18, 2024
63aee8f
pin cwltool==3.1.20241217163858 (relates to https://github.com/common…
fmigneault Dec 18, 2024
3ee07a8
drop mixed use of classproperty/classmethod that has removed support …
fmigneault Dec 18, 2024
4934343
patch false-positive pylint
fmigneault Dec 18, 2024
f104248
fix mock for WPS package test needing a job to resolve PROV configura…
fmigneault Dec 18, 2024
f5352d1
add missing coverage for prov
fmigneault Dec 19, 2024
b91cfc6
fix imports linting
fmigneault Dec 19, 2024
83169b5
fix tests - handle cwltool logs sometimes adding a suffix idx to dist…
fmigneault Dec 19, 2024
e41cd24
consider potential alternate log name for cwltool.job module
fmigneault Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 26 additions & 4 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,32 @@ Changes:
--------
- Add support of Python 3.13.
- Drop support of Python 3.8.

Fixes:
------
- No change.
- Add support of *OGC API - Processes - Part 4: Job Management* related to ``PROV`` requirement and conformance classes.
- Add support of `W3C PROV <https://www.w3.org/TR/prov-overview/>`_ to provide ``GET /jobs/{jobId}/prov`` endpoints
and all underlying paths (``/info``, ``/who``, ``/run``, ``/inputs``, ``/outputs``, and ``../{runId}`` variants)
to retrieve provenance metadata from a `Job` execution and its corresponding `Process` and `Workflow` definitions,
as processed by ``cwltool``/``cwlprov`` and extended by `Weaver`-specific server metadata.
Supported ``PROV`` representations are ``PROV-N``, ``PROV-NT``, ``PROV-JSON``, ``PROV-JSONLD``, ``PROV-XML``
and ``PROV-TURTLE``, each of which can be obtained by providing the corresponding ``Accept`` headers.
- Add ``weaver.cwl_prov`` configuration option to control the new ``PROV`` metadata collection feature.
- Add ``prov`` and ``provenance`` CLI and ``WeaverClient`` operations.
- Extend ``weaver.cli.WeaverArgumentParser`` "*rules*" to allow returning an error message providing better
case-by-case details about the specific cause of failure handled by the *rule* callable.
- Update certain ``cornice`` service definitions that were using "``prov``" as referencing to `Providers` to avoid
confusion with the multiple ``PROV``/`Provenance` related terminology and services added for the new feature.
- Pin ``cwltool==3.1.20241217163858`` to employ the official release including
``PROV`` configuration provided to easily configured `Weaver`
(relates to `common-workflow-language/cwltool#2082 <https://github.com/common-workflow-language/cwltool/pull/2082>_)
and integrate previously provided fixes
(relates to `common-workflow-language/cwltool#2082 <https://github.com/common-workflow-language/cwltool/pull/2036>_)
that were applied by a forked backport ``https://github.com/fmigneault/cwltool`` repository.

Fixes:
------
- Fix missing documentation about certain ``WeaverClient`` operations.
- Fix ``weaver.cli.OperationResult`` not setting its ``text`` property when a valid non-`JSON` response is obtained.
- Fix the `API` frontpage `HTML` rendering to returning enabled features and corresponding ``doc``/``url``/``api``
endpoints for quick referencing the capabilities activated for a `Weaver` instance.

.. _changes_6.0.0:

Expand Down
7 changes: 6 additions & 1 deletion config/weaver.ini.example
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,11 @@ weaver.cwl_egid =
weaver.cwl_processes_dir =
weaver.cwl_processes_register_error = false

# provenance functionality
# if disabled, provenance details will not be collected when running Application Packages and Workflows
# if disabled, the '/jobs/{jobId}/prov' endpoint will always report missing information since unavailable
weaver.cwl_prov = true

# --- Weaver WPS settings ---
weaver.wps = true
weaver.wps_url =
Expand Down Expand Up @@ -128,7 +133,7 @@ weaver.wps_metadata_identification_keywords=Weaver,WPS,OGC
# access constraints can be comma-separated
weaver.wps_metadata_identification_accessconstraints=NONE
weaver.wps_metadata_identification_fees=NONE
weaver.wps_metadata_provider_name=CRIM
weaver.wps_metadata_provider_name=Computer Research Institute of Montréal (CRIM)
weaver.wps_metadata_provider_url=http://pavics-weaver.readthedocs.org/en/latest/
weaver.wps_metadata_contact_name=Francis Charette-Migneault
weaver.wps_metadata_contact_position=Research Software Developer
Expand Down
12 changes: 12 additions & 0 deletions docs/source/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,15 @@ Glossary
Entity that describes the required inputs, produced outputs, and any applicable metadata for the execution of
the defined script, calculation, or operation.

PROV
Provenance
Metadata using the :term:`W3C` |PROV|_ standard that is applied to a submitted :term:`Job` execution to allow
retrieving its origin, the related :term:`Application Package`, its :term:`I/O` sources and results, as well as
additional details about the server host and runtime user as applicable to replicate the experiment.

.. seealso::
:ref:`proc_op_job_prov`

Provider
Entity that offers an ensemble of :term:`Process` under it. It is typically a reference to a remote service,
where any :term:`Process` it provides is fetched dynamically on demand.
Expand Down Expand Up @@ -331,6 +340,9 @@ Glossary
Since |ogc-api-standards|_ are based on HTTP and web communications, this consortium establishes the
common foundation definitions used by the :term:`API` specifications.

.. seealso::
|w3c|_

WKT
Well-Known Text geometry representation.

Expand Down
70 changes: 69 additions & 1 deletion docs/source/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,29 @@ Python Client Commands
For details about using the Python :py:class:`weaver.cli.WeaverClient`, please refer directly to its class
documentation and its underlying methods.

* :py:meth:`weaver.cli.WeaverClient.info`
* :py:meth:`weaver.cli.WeaverClient.version`
* :py:meth:`weaver.cli.WeaverClient.conformance`
* :py:meth:`weaver.cli.WeaverClient.register`
* :py:meth:`weaver.cli.WeaverClient.unregister`
* :py:meth:`weaver.cli.WeaverClient.deploy`
* :py:meth:`weaver.cli.WeaverClient.undeploy`
* :py:meth:`weaver.cli.WeaverClient.capabilities`
* :py:meth:`weaver.cli.WeaverClient.describe`
* :py:meth:`weaver.cli.WeaverClient.package`
* :py:meth:`weaver.cli.WeaverClient.jobs`
* :py:meth:`weaver.cli.WeaverClient.trigger_job`
* :py:meth:`weaver.cli.WeaverClient.update_job`
* :py:meth:`weaver.cli.WeaverClient.execute`
* :py:meth:`weaver.cli.WeaverClient.monitor`
* :py:meth:`weaver.cli.WeaverClient.dismiss`
* :py:meth:`weaver.cli.WeaverClient.status`
* :py:meth:`weaver.cli.WeaverClient.inputs`
* :py:meth:`weaver.cli.WeaverClient.outputs`
* :py:meth:`weaver.cli.WeaverClient.logs`
* :py:meth:`weaver.cli.WeaverClient.statistics`
* :py:meth:`weaver.cli.WeaverClient.exceptions`
* :py:meth:`weaver.cli.WeaverClient.provenance`
* :py:meth:`weaver.cli.WeaverClient.dismiss`
* :py:meth:`weaver.cli.WeaverClient.results`
* :py:meth:`weaver.cli.WeaverClient.upload`

Expand Down Expand Up @@ -479,6 +494,59 @@ Sample Output:
.. literalinclude:: ../../weaver/wps_restapi/examples/job_results.json
:language: json

.. _cli_example_job_prov:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Job Provenance Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Accomplishes the :term:`Job` |PROV|_ request to obtain :term:`Provenance` metadata.

Below examples employ the ``Echo`` :term:`Process` available in |weaver-func-test-apps|_
and assume the referenced :term:`Job` was completed successfully.

.. note::
There fore multiple alternative format representations offered by this operation.
Not all of them are presented below. See the various ``prov_type`` and ``prov_format``
parameters for the combinations.

.. seealso::
- :ref:`proc_op_job_prov` provides more details about available endpoints, operations and metadata returned.

.. code-block:: shell
:caption: Command Line

weaver prov -u ${WEAVER_URL} -j "1c49f085-bbd7-410d-a801-81fd42469e8a" --pT run

.. code-block:: python
:caption: Python

from weaver.provenance import ProvenancePathType

client.prov("1c49f085-bbd7-410d-a801-81fd42469e8a", prov_type=ProvenancePathType.PROV_RUN)

Sample Output:

.. literalinclude:: ../../weaver/wps_restapi/examples/job_prov_run.txt
:language: text

.. code-block:: shell
:caption: Command Line

weaver prov -u ${WEAVER_URL} -nL --pF "PROV-JSON"

.. code-block:: python
:caption: Python

from weaver.provenance import ProvenanceFormat

client.prov("1c49f085-bbd7-410d-a801-81fd42469e8a", prov_format=ProvenanceFormat.PROV_N)

Sample Output:

.. literalinclude:: ../../weaver/wps_restapi/examples/job_prov.txt
:language: text

.. _cli_example_upload:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
17 changes: 17 additions & 0 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,23 @@ they are optional and which default value or operation is applied in each situat

.. versionadded:: 1.9

.. _weaver-cwl-prov:

- | ``weaver.cwl_prov = true|false`` [:class:`bool`-like]
| (default: ``true``)
|
| Configure whether :term:`W3C` |PROV|_ functionality using the :ref:`proc_op_job_prov` endpoints should be enabled
to collect :term:`Provenance` metadata when executing the underlying :term:`CWL` of a given :term:`Process`
or :term:`Workflow`.

.. note::

Any pre-existing :term:`Job` that was created when this option did not yet exist or that was executed while
it was disabled will not offer :term:`Provenance` metadata. This is intrinsic to the functionality that must obtain
timely metadata *while* executing to properly represent operational steps and :term:`Job` updates as they occur.

.. versionadded:: 6.1

.. _weaver-wps:

- | ``weaver.wps = true|false`` [:class:`bool`-like]
Expand Down
124 changes: 116 additions & 8 deletions docs/source/processes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ through some parsing (e.g.: :ref:`proc_wps_12`) or with some requirement indicat
special handling. The represented :term:`Process` is aligned with |ogc-api-proc|_ specifications.

When deploying one such :term:`Process` directly, it is expected to have a definition specified
with a :term:`CWL` `Application Package`_, which provides resources about one of the described :ref:`app_pkg_types`.
with a :term:`CWL` :ref:`application-package`, which provides resources about one of the described :ref:`app_pkg_types`.

This is most of the time employed to wrap operations packaged in a reference :term:`Docker` image, but it can also
wrap :ref:`app_pkg_remote` to be executed on another server (i.e.: :term:`ADES`). When the :term:`Process` should be
Expand Down Expand Up @@ -490,6 +490,8 @@ the |getcap-req|_ request.
Modify an Existing Process (Update, Replace, Undeploy)
-----------------------------------------------------------------------------

.. versionadded:: 4.20

Since `Weaver` supports |ogc-api-proc-part2|_, it is able to remove a previously registered :term:`Process` using
the :ref:`Deployment <proc_op_deploy>` request. The undeploy operation consist of a ``DELETE`` request targeting the
specific ``{WEAVER_URL}/processes/{processID}`` to be removed.
Expand All @@ -498,8 +500,6 @@ specific ``{WEAVER_URL}/processes/{processID}`` to be removed.
The :term:`Process` must be accessible by the user considering any visibility configuration to perform this step.
See :ref:`proc_op_deploy` section for details.

.. versionadded:: 4.20

Starting from version `4.20 <https://github.com/crim-ca/weaver/tree/4.20.0>`_, a :term:`Process` can be replaced or
updated using respectively the ``PUT`` and ``PATCH`` requests onto the specific ``{WEAVER_URL}/processes/{processID}``
location of the reference to modify.
Expand Down Expand Up @@ -1989,7 +1989,7 @@ the configured :term:`WPS` output directory.
Header ``X-WPS-Output-Context`` is ignored when using `S3` buckets for output location since they are stored
individually per :term:`Job` UUID, and hold no relevant *context* location. See also :ref:`conf_s3_buckets`.

.. versionadded:: 4.3
.. versionchanged:: 4.3
Addition of the ``X-WPS-Output-Context`` header.

.. _proc_op_execute_subscribers:
Expand Down Expand Up @@ -2419,10 +2419,118 @@ Note again that the more the :term:`Process` is verbose, the more tracking will
Job Provenance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. fixme: CWL and Job Prov (https://github.com/crim-ca/weaver/issues/673)
.. todo::
implement ``GET /jobs/{jobID}/run`` and/or ``GET /jobs/{jobID}/prov``
(see https://github.com/crim-ca/weaver/issues/673)
.. versionadded:: 6.1

The provenance endpoints allow to obtain :term:`W3C` |PROV|_ metadata from a successfully completed :term:`Job`
using various representations. This provenance information can help identify traceability information such as the input
data sources, validate output checksums, and understand all internal :term:`Process` data transformations that were
involved within an executed :term:`Workflow`.

The |PROV|_ metadata consists of information records about entities, activities, and people involved in producing a
piece of data or thing |PROV-dfn|_, which can be used to form assessments about its quality, reliability or
trustworthiness.

.. |PROV-dfn| replace:: :sup:`[^]`
.. _PROV-dfn: https://www.w3.org/TR/2013/REC-prov-dm-20130430/#dfn-provenance

.. seealso::
- |PROV-overview|_
- |cwltool-cwlprov|_

.. figure:: https://www.w3.org/TR/2013/REC-prov-o-20130430/diagrams/starting-points.svg
:alt: PROV-O Resources
:target: `PROV-O`_
:align: center
:width: 500px

Provenance Resource Relationships [|PROV-O|_]


The provenance endpoints are provided in alignment with the |ogc-api-proc-part4|_ provenance class requirement.
However, `Weaver` also provides additional functionalities in comparison to the minimal requirements from the
:term:`OGC` specification.

Following is a table of available formats and corresponding endpoints offered by `Weaver`.

.. list-table:: Job Provenance Endpoints
:name: table-job-prov
:align: center
:header-rows: 1
:widths: 25,10,20,45

* - Endpoint
- |PROV|_ Format
- :term:`Media-Type`
- Description
* - ``/jobs/{jobID}/prov``
- |PROV-JSON|_
- ``application/json``
- :term:`Provenance` metadata using :term:`JSON` representation.
* - ``/jobs/{jobID}/prov``
- |PROV-JSONLD|_
- ``application/ld+json``
- :term:`Provenance` metadata using |JSON-LD|_ representation.
* - ``/jobs/{jobID}/prov``
- |PROV-XML|_
- ``text/xml`` or ``application/xml``
- :term:`Provenance` metadata using :term:`XML` representation.
* - ``/jobs/{jobID}/prov``
- |PROV-N|_
- ``text/provenance-notation``
- :term:`Provenance` metadata using the main |PROV|_ notation representation.
* - ``/jobs/{jobID}/prov``
- PROV-NT
- ``application/n-triples``
- :term:`Provenance` metadata using |rdf-n-triples|_ (NT) representation.
* - ``/jobs/{jobID}/prov``
- PROV-TURTLE
- ``text/turtle``
- :term:`Provenance` metadata using |rdf-turtle|_ (TTL) representation.
* - ``/jobs/{jobID}/prov/info``
- |na|
- ``text/plain``
- Metadata about the *Research Object* packaging information.
* - ``/jobs/{jobID}/prov/who``
- |na|
- ``text/plain``
- Metadata of who ran the :term:`Job`.
* - ``/jobs/{jobID}/prov/runs``
- |na|
- ``text/plain``
- Obtain the list of ``runID`` steps of the :term:`Workflow` within the :term:`Job`.
* - ``/jobs/{jobID}/prov/run``
- |na|
- ``text/plain``
- Metadata of the main :term:`Job` and any nested step runs in the case of a :term:`Workflow`.
* - ``/jobs/{jobID}/prov/inputs``
- |na|
- ``text/plain``
- Metadata about the :term:`Job` input IDs.
* - ``/jobs/{jobID}/prov/outputs``
- |na|
- ``text/plain``
- Metadata about the :term:`Job` output IDs.
* - ``/jobs/{jobID}/prov/[run|inputs|outputs]/{runID}``
- |na|
- ``text/plain``
- Same as their respective definitions above, but for a specific step of a :term:`Workflow`.

.. seealso::
This feature is enabled by default. Its functionality and the corresponding :term:`API` endpoints
can be controlled using :ref:`Configuration Option <weaver-cwl-prov>` ``weaver.cwl_prov``.

Resulting metadata that is collected from :term:`Job` :term:`Provenance` will be stored under a similar endpoint
as the :ref:`exec_output_location`, except with an additional ``-prov`` suffix applied after the :term:`Job` UUID,
as shown below.
This location is selected to conveniently offer the ``PROV`` metadata with a different parent directory than
the :term:`Job` outputs, therefore allowing different endpoint access control schemes between the ``PROV`` metadata
and actual output data, while also reusing the configured :ref:`exec_output_location` that can be used to quickly
serve :term:`Provenance` contents without any additional configuration.

.. code-block::

{WPS_OUTPUT_URL}[/{WPS_OUTPUT_CONTEXT}]/{JOB_UUID}-prov


.. _proc_op_job_stats:

Expand Down
Loading
Loading