Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create 0019-proteomics_workflow_versioning #945

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

aclum
Copy link
Contributor

@aclum aclum commented Nov 19, 2024

No description provided.

Copy link
Contributor

@kheal kheal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @aclum . Here's an associated issue for reference: microbiomedata/nmdc-schema#2256

Copy link
Member

@turbomam turbomam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aclum and @kheal will the two distinct proetomics workflow classes have significantly different constellations of slots?

@SamuelPurvine
Copy link

@picowatt probably has the best grasp of what would be different, but ultimately the two workflows should be virtually identical with respect to their slots. We 'might' need a slot or two in the V2/Kaiko/metagenome-independent version that calls out the version and source for the protein sequences against which the de-novo processing is compared, but everything else will be the same slot-wise if I understand things aright.

@aclum
Copy link
Contributor Author

aclum commented Nov 22, 2024

@turbomam did that address your question?

@turbomam
Copy link
Member

turbomam commented Nov 22, 2024

If the slots associated with the two workflows will be virtually identical, then I am not in favor of making new classes. I recommend keeping one class and adding a slot to clarify which workflow is instantiated. The range of that slot should be an enumeration. Metadata about the two workflows can be captured as annotations on the two permissible values.

@turbomam
Copy link
Member

turbomam commented Nov 22, 2024

Among other things, this is in keeping with patterns we use for data generation. We don't have a MaldiMassSpectrometry class or a SequencingBySynthesis class. We have MassSpectrometry and NucleotideSequencing classes. Those kinds of distinctions would be captured in the related instrument modeling.

@kheal
Copy link
Contributor

kheal commented Nov 22, 2024

I think an slot + enumeration could work.

Pros:
It would still be clear from the mongo record which MetaPAnalysis category it is
No legacy id patterns would exist
Keeps with existing model ethos

Cons:
More challenging in metadata generation
More challenging for UI display (I assume)
we'd have to check if there was a previous or subsequent run of the same variety with the same sample input when generating or displaying appropriate id

I don't think these cons are insurmountable and if so, I'd vote to amend the ADR to incorporate @turbomam's suggestions.

@aclum
Copy link
Contributor Author

aclum commented Nov 23, 2024

under @turbomam's suggestion V1 and V2 of the workflow would share an ID blade or no?

@turbomam
Copy link
Member

Thanks for the flexibility and the helpful advantages/disadvantages analysis @kheal . We should be doing those for all of our schema decisions!

@aclum do you have any advantages/disadvantages thoughts about using a common typecode vs version-specific typecodes? I having a hard time thinking of any imminent technical reasons why we couldn't use two typecodes, but it doesn't seem like a good practice to me.

I won't be looking at work stuff much more this week.

@turbomam
Copy link
Member

Actually, we should check with @donny about allowing multiple type codes for a single class

@kheal
Copy link
Contributor

kheal commented Nov 25, 2024

There are a couple conversations going on here, I'm going to try to make them more explicit in this comment.

1) Should we have different classes for the two categories of MetaProteomicsAnalysis in the schema?

From @turbomam 's comments earlier, I am leaning towards no, but instead add a slot and enumeration which would enable users to know which category of MetaP analysis was used from the Mongo record.

2) Should we have different typecodes for records of the two categories of MetaProteomicsAnalysis in the database?

My thought is that we should keep with the current convention of 1:1 typecode:class unless there is a very good reason not to, especially since the already-processed data have this typecode.

3) Should we have different id_blades for records of the two categories of MetaProteomicsAnalysis in the database if they originate from the same DataObject?

The overwhelming consensus here is to have different id_blades for records that originate from the same DataObject if records represent different categories of the workflow. For example, if a DataObject is processed once via categoryA and once via categoryB, it would have two records, both with the ".version" of ".1", but with different id_blades. Reruns of categoryA and categoryB would increment the version tag separately.

cc @aclum, @SamuelPurvine

@aclum
Copy link
Contributor Author

aclum commented Nov 26, 2024

The minter currently takes as input the class so would need to be updated to take additional arguments so the minter has enough information to return the correct class. This plus wanting different blades to me is an argument for subclassing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants