-
-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Improved ProvenanceProfile
definition
#2082
Proposal: Improved ProvenanceProfile
definition
#2082
Conversation
…not single CommandLineTool (relates to common-workflow-language/cwltool#2082)
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2082 +/- ##
==========================================
+ Coverage 84.23% 84.73% +0.49%
==========================================
Files 46 46
Lines 8334 8337 +3
Branches 1961 1960 -1
==========================================
+ Hits 7020 7064 +44
+ Misses 838 805 -33
+ Partials 476 468 -8 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmigneault I am very happy to see that you want to implement CWLProv on your engine, and help refactoring this code is very welcome!
@fmigneault Can you add some more unit tests to increase the test code coverage? |
@mr-c |
Digging deeper into how the user provenance details were set with/without the I took the opportunity to refactor slightly the user-provenance strategy, since the code was partially duplicated between the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it when additional testing leads to fixes! Thank you @fmigneault .
Shall I merge as 1 big commit, or do you want to rebase this as several clean commits?
A rebase is better IMO. I prefer to keep the edit history of the attempts performed. |
Oh, by rebase I meant that you would locally create a series of cleaner commits using The edit history will be retained in this PR, but we don't need separate commits about "fix linting" in the main history 😊 Otherwise I can squash and merge as a single commit, which will still preserve the original commit history in this PR. |
The squash is good then :) you can proceed whenever desired |
Thanks :) |
Please let me know once a tag is available so I can pin it on my end. |
You are welcome @fmigneault ; the release is up at https://pypi.org/project/cwltool/3.1.20241217163858/ a.k.a https://github.com/common-workflow-language/cwltool/releases/tag/3.1.20241217163858 |
@mr-c
I would like to propose the following changes to the PROV utilities.
The reasoning behind the requested changes is that I am in the process of implementing PROV in crim-ca/weaver#778 such that OGC API - Processes can be extended to demonstrate PROV capabilities (i.e.: https://docs.ogc.org/DRAFTS/24-051.html#_requirements_class_provenance), which can build upon the great work from the CWL community. This would allow the Geospatial community to improve traceability and understanding of their processing pipelines.
However, while I'm able to enable the PROV features and get the resulting metadata, I end up in a situation where the resulting tool and execution PROV files generated do not reflect the reality of what happened with the CWL workflow run, since all the details about the remote server where they are running, the actual users employed by worker instances crunching the data,
weaver
dispatching the workflow sequencing/resolution tocwltool
, or any intermediate transformations from "geo data sources" to CWL-compatible inputs is not reported anywhere.In the current code state, the
ProvenanceProfile
class is the one that modifies the PROV document (and with which I would need to extended entities/agents/relationships). This class generates the resulting metadata all within the job execution, and is not easily accessible from "outside"cwltool
steps. The only interface that I can access part of the references is are theLoadingContext.research_obj
andRuntimeContext.research_obj
(along some other arguments likeorcid
).Therefore, this PR delegates the creation of the
ProvenanceProfile
instance toLoadingContext
, such that I can create a derivedLoadingContext
that extends the profile with definitions that are more aligned withweaver
andcwltool
working together. From the point of view ofcwltool
, the operations resolve exactly the same way as before.Let me know if you have any question or if anything should be adjusted.