-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create 0019-proteomics_workflow_versioning #945
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up @aclum . Here's an associated issue for reference: microbiomedata/nmdc-schema#2256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picowatt probably has the best grasp of what would be different, but ultimately the two workflows should be virtually identical with respect to their slots. We 'might' need a slot or two in the V2/Kaiko/metagenome-independent version that calls out the version and source for the protein sequences against which the de-novo processing is compared, but everything else will be the same slot-wise if I understand things aright. |
@turbomam did that address your question? |
If the slots associated with the two workflows will be virtually identical, then I am not in favor of making new classes. I recommend keeping one class and adding a slot to clarify which workflow is instantiated. The range of that slot should be an enumeration. Metadata about the two workflows can be captured as annotations on the two permissible values. |
Among other things, this is in keeping with patterns we use for data generation. We don't have a MaldiMassSpectrometry class or a SequencingBySynthesis class. We have MassSpectrometry and NucleotideSequencing classes. Those kinds of distinctions would be captured in the related instrument modeling. |
I think an slot + enumeration could work. Pros: Cons: I don't think these cons are insurmountable and if so, I'd vote to amend the ADR to incorporate @turbomam's suggestions. |
under @turbomam's suggestion V1 and V2 of the workflow would share an ID blade or no? |
Thanks for the flexibility and the helpful advantages/disadvantages analysis @kheal . We should be doing those for all of our schema decisions! @aclum do you have any advantages/disadvantages thoughts about using a common typecode vs version-specific typecodes? I having a hard time thinking of any imminent technical reasons why we couldn't use two typecodes, but it doesn't seem like a good practice to me. I won't be looking at work stuff much more this week. |
Actually, we should check with @donny about allowing multiple type codes for a single class |
There are a couple conversations going on here, I'm going to try to make them more explicit in this comment. 1) Should we have different classes for the two categories of From @turbomam 's comments earlier, I am leaning towards no, but instead add a slot and enumeration which would enable users to know which category of MetaP analysis was used from the Mongo record. 2) Should we have different typecodes for records of the two categories of My thought is that we should keep with the current convention of 1:1 typecode:class unless there is a very good reason not to, especially since the already-processed data have this typecode. 3) Should we have different id_blades for records of the two categories of The overwhelming consensus here is to have different id_blades for records that originate from the same cc @aclum, @SamuelPurvine |
The minter currently takes as input the class so would need to be updated to take additional arguments so the minter has enough information to return the correct class. This plus wanting different blades to me is an argument for subclassing. |
No description provided.