-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve reference metadata handling for EventSource #2648
Conversation
The recent addition of attaching the reference metadata of input files to the provenance information implemented only that the reference metadata is read directly by ctapipe from the input file. This made it necessary to either support all possible input file types or impossible for plugin event sources to provide this metadata on their own. It also makes the assumption 1 EventSource = 1 input file. This is not true for some event sources. This issue is solved by: * Adding the possibility to directly provide the reference metadata to `add_input_file` * Move the responsibility of calling `add_input_file` from the EventSource baseclass to the implementation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Looks like a good idea. Probably need some tests, particularly to ensure that anyone implementing an EventSource plugin doesn't forget to add the input file to the provenance. |
MMh. The only way we could enforce this I think is to add an abstract method to the However, I want to support event sources that have multiple input files, as I know those exist (and we'll have e.g. parallel zfits streams also for the ACADA data). So the API should be something like this:
But the issue here is that I know that some event source implementations only open files one-by-one. So I think the solution above is not right. It's really the event sources that have to internally call I don't think we can have a unit test here in ctapipe to enforce this. |
Added explicit tests now for the two |
Analysis Details3 IssuesCoverage and DuplicationsProject ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs |
eebb3d7
to
4c2aafe
Compare
The reference metadata was designed specifically to be an output, not an input, so this may not be fully necessary to go that deep here. All we need to get is the info that we will eventually need to write out, so e.g. the full list of input product_ids is not necessary. The back-links to that info is what would be done in a provenance database derived from the provenance logs. So I think the issue with multiple inputs in the first element in the processing chain is: how to assign a |
I am surprised to hear this... in the original issue #2571 (comment) and the implementation #2598 you supported the idea of getting the product ids for inputs |
I think I didn't explain well: I do support this ( The other issue is the difference between "Local Provenance" (inputs and outputs of an Activity) and" TLDR: we do need to read ReferencemMetadata of the inputs, as it is necessary for provenance. It is not propegated to the output ReferenceMetadata, however. To solve places where we have no product_id, we could just suggest that the EventSources put in the obs_id or something like that (I don't really like mixing them though, so maybe we need to think a bit on the model). In any case, this PR is good. |
That's how this works here. It is attaching the reference metadata of input files to the provenance, not to the output reference meta. |
ca55312
to
0981806
Compare
The recent addition of attaching the reference metadata of input files to the provenance information implemented only that the reference metadata is read directly by ctapipe from the input file. This made it necessary to either support all possible input file types
or impossible for plugin event sources to provide this metadata on their own. It also makes the assumption 1 EventSource = 1 input file. This is not true for some event sources.
This issue is solved by:
add_input_file
add_input_file
from the EventSource baseclass to the implementationThis is an alternative implementation to #2644