-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update gptmd mods #2446
Open
trishorts
wants to merge
17
commits into
smith-chem-wisc:master
Choose a base branch
from
trishorts:updateGptmdMods
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Update gptmd mods #2446
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trishorts
requested review from
Alexander-Sol,
pcruzparri,
nbollis,
zhuoxinshi and
RayMSMS
December 5, 2024 14:24
nbollis
reviewed
Dec 5, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm.
nbollis
approved these changes
Dec 5, 2024
RayMSMS
approved these changes
Dec 6, 2024
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2446 +/- ##
=======================================
Coverage 93.81% 93.81%
=======================================
Files 141 141
Lines 21993 21993
Branches 3014 3014
=======================================
Hits 20633 20633
Misses 910 910
Partials 450 450 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently reanalyzed modification counts in the human uniprot xml. Table below. I aimed to update the gptmd mod collection to contain all mods such that ~99% of those below were covered. After some back and forth with the previous list, I arrived at the mods in this PR. Any mods removed from the previous common biological set were placed into the less common list.
Note, I moved allysine from the uniprot ptmlist to the common biological.
Note also the big increase in the modification list used for top-down. Will post below the effect observed from this change on the jenkins top-down dataset.
Modification Count in UniProt Running Fraction
Phosphoserine 33146 0.598865361
Phosphothreonine 6013 0.70750524
N6-acetyllysine 4321 0.785574908
Phosphotyrosine 2136 0.824167088
N6-succinyllysine 1265 0.847022476
Omega-N-methylarginine 1027 0.865577799
N-acetylalanine 949 0.882723856
N-acetylmethionine 847 0.898027029
Asymmetric dimethylarginine 548 0.907928019
4-hydroxyproline 450 0.916058394
N6-(2-hydroxyisobutyryl)lysine 420 0.923646744
N-acetylserine 410 0.931054419
N6-(beta-hydroxybutyryl)lysine 277 0.936059117
N6-lactoyllysine 272 0.940973477
N6-glutaryllysine 264 0.945743297
N6-methyllysine 256 0.950368577
N6-crotonyllysine 222 0.954379562
Sulfotyrosine 151 0.957107755
Cysteine methyl ester 129 0.959438462
N6,N6-dimethyllysine 122 0.961642697
N6,N6,N6-trimethyllysine 120 0.963810797
Citrulline 117 0.965924695
4-carboxyglutamate 104 0.967803715
Hydroxyproline 104 0.969682735
Pyrrolidone carboxylic acid 96 0.971417215
N-acetylthreonine 81 0.972880682
S-nitrosocysteine 80 0.974326082
5-hydroxylysine 78 0.975735347
Dimethylated arginine 71 0.97701814
N6-butyryllysine 62 0.978138325
3'-nitrotyrosine 61 0.979240442
ADP-ribosylserine 60 0.980324492
Symmetric dimethylarginine 54 0.981300137
N6-(pyridoxal phosphate)lysine 47 0.98214931
N6-malonyllysine 45 0.982962347
Allysine 41 0.983703115
PolyADP-ribosyl glutamic acid 36 0.984353545
Deamidated asparagine 33 0.984949772
N-acetylglycine 30 0.985491797
ADP-ribosyl glutamic acid 28 0.985997687
N6-methylated lysine 28 0.986503577
(3S)-3-hydroxyasparagine 27 0.9869914
N5-methylglutamine 27 0.987479222
3-hydroxyproline 24 0.987912842
S-(2-succinyl)cysteine 23 0.988328395
N-acetylproline 22 0.98872588
5-glutamyl polyglutamate 19 0.989069162
ADP-ribosylarginine 19 0.989412445
ADP-ribosylcysteine 19 0.989755727
Iodotyrosine 17 0.990062875
3-oxoalanine (Cys) 17 0.990370022
Methionine (R)-sulfoxide 16 0.990659102
N-acetylvaline 14 0.990912047
Deamidated glutamine 13 0.991146925
(Microbial infection) O-acetylthreonine 13 0.991381802
Phenylalanine amide 13 0.99161668
Cysteine sulfenic acid (-SOH) 12 0.99183349
Pros-methylhistidine 12 0.9920503
S-glutathionyl cysteine 11 0.992249042
N-acetylcysteine 11 0.992447785
Methionine sulfoxide 11 0.992646527
5-glutamyl serotonin 11 0.99284527
N6-propionyllysine 11 0.993044012
(3R)-3-hydroxyasparagine 10 0.993224687
(Microbial infection) O-acetylserine 10 0.993405362
N6-(ADP-ribosyl)lysine 10 0.993586037
Cysteine sulfonic acid (-SO3H) 9 0.993748645
Methionine amide 9 0.993911252
Diiodotyrosine 9 0.99407386
Proline amide 8 0.9942184
2',4',5'-topaquinone 8 0.99436294
Cysteine persulfide 8 0.99450748
Tele-methylhistidine 8 0.99465202
(3R)-3-hydroxyaspartate 8 0.99479656
N,N,N-trimethylalanine 8 0.9949411
5-glutamyl dopamine 8 0.99508564
ADP-ribosyl aspartic acid 7 0.995212112
N6-lipoyllysine 6 0.995320517
Leucine amide 6 0.995428922
Tyrosine amide 6 0.995537327
N6-(retinylidene)lysine"> 6 0.995645732
S-methylcysteine 6 0.995754137
N6-(pyridoxal phosphate)lysine"> 6 0.995862542
5-glutamyl glycerylphosphorylethanolamine 6 0.995970947
N6-biotinyllysine 5 0.996061285
O-(pantetheine 4'-phosphoryl)serine 5 0.996151622
Glycine amide 5 0.99624196
Thyroxine 5 0.996332297
N6-(retinylidene)lysine 5 0.996422635
N,N,N-trimethylglycine 5 0.996512972
5-glutamyl glycine 5 0.99660331
Omega-N-methylated arginine 5 0.996693647
S-(2,3-dicarboxypropyl)cysteine 5 0.996783985
(Microbial infection) ADP-riboxanated arginine 5 0.996874322
Phosphohistidine 4 0.996946592
(Microbial infection) Phosphothreonine 4 0.997018862
(Microbial infection) Deamidated asparagine 4 0.997091132
Valine amide 4 0.997163402
Triiodothyronine 4 0.997235672
Arginine amide 4 0.997307942
ADP-ribosylasparagine 4 0.997380212
(Microbial infection) ADP-ribosylthreonine 4 0.997452482
ADP-ribosylglycine 4 0.997524752
O-AMP-tyrosine 4 0.997597022
(3S)-3-hydroxyhistidine 4 0.997669292
(Microbial infection) O-AMP-tyrosine 4 0.997741562
ADP-ribosylhistidine 3 0.997795765
(Microbial infection) Phosphoserine 3 0.997849967
Asparagine amide 3 0.99790417
Isoleucine amide 3 0.997958372
PolyADP-ribosyl aspartic acid 3 0.998012575
O-AMP-threonine 3 0.998066777
N6-carboxylysine 3 0.99812098
Aspartate 1-(chondroitin 4-sulfate)-ester"> 3 0.998175182
S-8alpha-FAD cysteine 3 0.998229385
Tele-8alpha-FAD histidine 3 0.998283587
Leucine methyl ester 3 0.99833779
(Microbial infection) O-AMP-threonine 3 0.998391992
N-acetylglutamate 3 0.998446195
(Microbial infection) O-(2-cholinephosphoryl)serine 3 0.998500397
Hypusine 3 0.9985546
Diphosphoserine 3 0.998608802
(Microbial infection) Deamidated glutamine 3 0.998663005
(Microbial infection) S-methylcysteine 3 0.998717207
3-hydroxyasparagine 3 0.99877141
4-hydroxynonenal-conjugated cysteine 3 0.998825612
(3R)-3-hydroxyarginine 2 0.998861747
Alanine amide 2 0.998897882
1-thioglycine 2 0.998934017
5-glutamyl histamine 2 0.998970152
Pyruvic acid (Ser) 2 0.999006287
Cysteine sulfinic acid (-SO2H) 2 0.999042422
Lysine amide 2 0.999078557
(3S)-3-hydroxylysine 2 0.999114692
ADP-ribosyltyrosine 2 0.999150827
N6-acetyl-N6-methyllysine 2 0.999186962
Sulfoserine 2 0.999223097
S-cysteinyl cysteine 2 0.999259232
4-hydroxynonenal-conjugated histidine 2 0.999295367
Hydroxyarginine 1 0.999313435
Glycyl adenylate 1 0.999331502
S-cysteinyl cysteine"> 1 0.99934957
Beta-decarboxylated aspartate 1 0.999367637
Glutamic acid 1-amide 1 0.999385705
Sulfocysteine 1 0.999403772
Blocked amino end (Ser) 1 0.99942184
N,N-dimethylproline 1 0.999439907
Blocked amino end (Thr)"> 1 0.999457975
N,N-dimethylglycine 1 0.999476042
N-methylglycine 1 0.99949411
S-(dipyrrolylmethanemethyl)cysteine 1 0.999512177
3'-bromotyrosine 1 0.999530245
N4,N4-dimethylasparagine 1 0.999548312
(Microbial infection) ADP-ribosyldiphthamide 1 0.99956638
Diphthamide 1 0.999584447
N6-1-carboxyethyl lysine 1 0.999602515
(3S)-3-hydroxyaspartate 1 0.999620582
N,N,N-trimethylserine 1 0.99963865
N,N-dimethylserine 1 0.999656717
N-methylserine 1 0.999674785
Blocked amino end (Ser)"> 1 0.999692852
O-acetylserine 1 0.99971092
N-acetylaspartate 1 0.999728987
(Microbial infection) ADP-ribosylasparagine 1 0.999747055
4-hydroxylysine 1 0.999765122
N-pyruvate 2-iminyl-valine 1 0.99978319
S-cGMP-cysteine 1 0.999801257
O-(2-cholinephosphoryl)serine 1 0.999819325
5-glutamyl glutamate 1 0.999837392
Thiazolidine linkage to a ring-opened DNA abasic site 1 0.99985546
(4R)-5-hydroxyleucine 1 0.999873527
(4R)-5-oxoleucine 1 0.999891595
Glutamine amide 1 0.999909662
O-AMP-serine 1 0.99992773
Pyrrolidone carboxylic acid (Glu) 1 0.999945797
(Microbial infection) N6-acetyllysine 1 0.999963865
4-hydroxynonenal-conjugated lysine 1 0.999981932
2,3-didehydroalanine (Ser) 1 1