Create 0019-proteomics_workflow_versioning #945

aclum · 2024-11-19T19:58:02Z

No description provided.

kheal

Thanks for writing this up @aclum . Here's an associated issue for reference: microbiomedata/nmdc-schema#2256

turbomam

@aclum and @kheal will the two distinct proetomics workflow classes have significantly different constellations of slots?

SamuelPurvine · 2024-11-21T18:55:01Z

@picowatt probably has the best grasp of what would be different, but ultimately the two workflows should be virtually identical with respect to their slots. We 'might' need a slot or two in the V2/Kaiko/metagenome-independent version that calls out the version and source for the protein sequences against which the de-novo processing is compared, but everything else will be the same slot-wise if I understand things aright.

aclum · 2024-11-22T22:45:11Z

@turbomam did that address your question?

turbomam · 2024-11-22T23:07:23Z

If the slots associated with the two workflows will be virtually identical, then I am not in favor of making new classes. I recommend keeping one class and adding a slot to clarify which workflow is instantiated. The range of that slot should be an enumeration. Metadata about the two workflows can be captured as annotations on the two permissible values.

turbomam · 2024-11-22T23:13:11Z

Among other things, this is in keeping with patterns we use for data generation. We don't have a MaldiMassSpectrometry class or a SequencingBySynthesis class. We have MassSpectrometry and NucleotideSequencing classes. Those kinds of distinctions would be captured in the related instrument modeling.

kheal · 2024-11-22T23:42:05Z

I think an slot + enumeration could work.

Pros:
It would still be clear from the mongo record which MetaPAnalysis category it is
No legacy id patterns would exist
Keeps with existing model ethos

Cons:
More challenging in metadata generation
More challenging for UI display (I assume)
we'd have to check if there was a previous or subsequent run of the same variety with the same sample input when generating or displaying appropriate id

I don't think these cons are insurmountable and if so, I'd vote to amend the ADR to incorporate @turbomam's suggestions.

aclum · 2024-11-23T01:21:08Z

under @turbomam's suggestion V1 and V2 of the workflow would share an ID blade or no?

turbomam · 2024-11-25T12:45:19Z

Thanks for the flexibility and the helpful advantages/disadvantages analysis @kheal . We should be doing those for all of our schema decisions!

@aclum do you have any advantages/disadvantages thoughts about using a common typecode vs version-specific typecodes? I having a hard time thinking of any imminent technical reasons why we couldn't use two typecodes, but it doesn't seem like a good practice to me.

I won't be looking at work stuff much more this week.

turbomam · 2024-11-25T17:08:22Z

Actually, we should check with @donny about allowing multiple type codes for a single class

kheal · 2024-11-25T17:46:53Z

There are a couple conversations going on here, I'm going to try to make them more explicit in this comment.

1) Should we have different classes for the two categories of MetaProteomicsAnalysis in the schema?

From @turbomam 's comments earlier, I am leaning towards no, but instead add a slot and enumeration which would enable users to know which category of MetaP analysis was used from the Mongo record.

2) Should we have different typecodes for records of the two categories of MetaProteomicsAnalysis in the database?

My thought is that we should keep with the current convention of 1:1 typecode:class unless there is a very good reason not to, especially since the already-processed data have this typecode.

3) Should we have different id_blades for records of the two categories of MetaProteomicsAnalysis in the database if they originate from the same DataObject?

The overwhelming consensus here is to have different id_blades for records that originate from the same DataObject if records represent different categories of the workflow. For example, if a DataObject is processed once via categoryA and once via categoryB, it would have two records, both with the ".version" of ".1", but with different id_blades. Reruns of categoryA and categoryB would increment the version tag separately.

cc @aclum, @SamuelPurvine

aclum · 2024-11-26T01:15:45Z

The minter currently takes as input the class so would need to be updated to take additional arguments so the minter has enough information to return the correct class. This plus wanting different blades to me is an argument for subclassing.

aclum added 2 commits November 14, 2024 10:51

Create 0019-proteomics_workflow_versioning

24461dc

Update 0019-proteomics_workflow_versioning

3320a5d

aclum marked this pull request as ready for review November 19, 2024 20:10

aclum requested review from cmungall, sierra-moxon, lamccue, kheal, SamuelPurvine, picowatt, turbomam, mslarae13, pdpiehowski, CamiloPosso and shreddd November 19, 2024 20:11

kheal approved these changes Nov 20, 2024

View reviewed changes

turbomam reviewed Nov 21, 2024

View reviewed changes

Update 0019-proteomics_workflow_versioning

a4dfe18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create 0019-proteomics_workflow_versioning #945

Create 0019-proteomics_workflow_versioning #945

aclum commented Nov 19, 2024

kheal left a comment

turbomam left a comment

SamuelPurvine commented Nov 21, 2024

aclum commented Nov 22, 2024

turbomam commented Nov 22, 2024 •

edited

Loading

turbomam commented Nov 22, 2024 •

edited

Loading

kheal commented Nov 22, 2024

aclum commented Nov 23, 2024

turbomam commented Nov 25, 2024

turbomam commented Nov 25, 2024

kheal commented Nov 25, 2024 •

edited

Loading

aclum commented Nov 26, 2024

Create 0019-proteomics_workflow_versioning #945

Are you sure you want to change the base?

Create 0019-proteomics_workflow_versioning #945

Conversation

aclum commented Nov 19, 2024

kheal left a comment

Choose a reason for hiding this comment

turbomam left a comment

Choose a reason for hiding this comment

SamuelPurvine commented Nov 21, 2024

aclum commented Nov 22, 2024

turbomam commented Nov 22, 2024 • edited Loading

turbomam commented Nov 22, 2024 • edited Loading

kheal commented Nov 22, 2024

aclum commented Nov 23, 2024

turbomam commented Nov 25, 2024

turbomam commented Nov 25, 2024

kheal commented Nov 25, 2024 • edited Loading

aclum commented Nov 26, 2024

turbomam commented Nov 22, 2024 •

edited

Loading

turbomam commented Nov 22, 2024 •

edited

Loading

kheal commented Nov 25, 2024 •

edited

Loading