Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CWL Prov with cwltool for OGC API - Processes IPT #673

Closed
19 of 21 tasks
fmigneault opened this issue Jun 21, 2024 · 0 comments · Fixed by #778
Closed
19 of 21 tasks

Support CWL Prov with cwltool for OGC API - Processes IPT #673

fmigneault opened this issue Jun 21, 2024 · 0 comments · Fixed by #778
Assignees
Labels
feature/CWL Issue related to CWL support feature/job/provenance Issue related to W3C PROV metadata applied to a Job. process/OAP-Part4: Jobs OGC API - Processes - Part 4: Job Management process/workflow Related to a Workflow process. project/OGC-GDC Developments related to OGC GeoDataCube project/OGC-IPT Developments related to OGC Integrity, Provenance, and Trust triage/feature New requested feature.

Comments

@fmigneault
Copy link
Collaborator

fmigneault commented Jun 21, 2024

Description

Given the rising need for IPT (Integrity, Provenance, Trust) through OGC APIs and their workflow processing, the provenance capabilities of CWL should be leveraged to accomplish this goal. This would add metadata references within the CWL Application Packages themselves, allowing better open-science and IPT workflow tracking.

To Do

  • GET /jobs/{jobId}/run to return the PROV-JSON produced by cwltool --provenance
    (edit: won't do)

  • GET /jobs/{jobId}/prov as alternate endpoint

  • consider additional PROV endpoints vs what cwlprov offers
    https://gitlab.ogc.org/ogc/T20-GDC/-/wikis/GDC-Provenance-demonstration-GeoLabs#usage

    • GET /jobs/{jobId}/prov (contents of various metadata/provenance/primary.cwlprov.{ext})
    • GET /jobs/{jobId}/prov/info (as cwlprov info or metadata/manifest.json)
    • GET /jobs/{jobId}/prov/who (as cwlprov who)
    • GET /jobs/{jobId}/prov/inputs (as cwlprov inputs)
    • GET /jobs/{jobId}/prov/inputs/{id} (as cwlprov inputs [<run-id>])
    • GET /jobs/{jobId}/prov/outputs (as cwlprov outputs)
    • GET /jobs/{jobId}/prov/outputs/{id} (as cwlprov outputs [<run-id>])
    • GET /jobs/{jobId}/prov/run (as cwlprov run --inputs --outputs --labels --duration --steps)
      (use all flags to get all available metadata)
    • GET /jobs/{jobId}/prov/run/{id} (as cwlprov run [<run-id>])
  • Alternate PROV-XML/RDF/etc. if Accept requests it
    (all variants should already be generated by cwltool as various manifest representations)

  • When generating cwltool --provenance results, avoid duplicating results already found in WPS-outputs to save space (use their URI for cross-reference).

    • if File is saved only as the {"class":"File", "path": "..."} definition, this could be allowed to avoid extra code managing the references
    • assume that strings are sufficiently small (as per cwltool's own content limit)
  • Any additional metadata/links pointing at the specific job and process executed that should be embedded in the PROV contents

  • Cross-walk with Support POST /jobs for various workflow implementations #716 requirements

  • Include ORCID and other relevant PROV metadata #783
    (see cwltool --orcid --enable-user-provenance --enable-host-provenance)

  • update CLI to provide a provenance operation

  • ensure provenance-related requirements are added to conformance

  • ensure provenance links are returned in job status response

  • add more W3C PROV details about process I/O #780

References

Implementation

@fmigneault fmigneault added triage/feature New requested feature. feature/CWL Issue related to CWL support labels Jun 21, 2024
@fmigneault fmigneault self-assigned this Jun 21, 2024
@github-actions github-actions bot added the process/workflow Related to a Workflow process. label Jun 21, 2024
@fmigneault fmigneault added project/OGC-GDC Developments related to OGC GeoDataCube project/OGC-IPT Developments related to OGC Integrity, Provenance, and Trust labels Jun 21, 2024
@fmigneault fmigneault added the process/OAP-Part4: Jobs OGC API - Processes - Part 4: Job Management label Nov 14, 2024
fmigneault added a commit that referenced this issue Dec 6, 2024
@fmigneault fmigneault linked a pull request Dec 7, 2024 that will close this issue
@fmigneault fmigneault added the feature/job/provenance Issue related to W3C PROV metadata applied to a Job. label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/CWL Issue related to CWL support feature/job/provenance Issue related to W3C PROV metadata applied to a Job. process/OAP-Part4: Jobs OGC API - Processes - Part 4: Job Management process/workflow Related to a Workflow process. project/OGC-GDC Developments related to OGC GeoDataCube project/OGC-IPT Developments related to OGC Integrity, Provenance, and Trust triage/feature New requested feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant