Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update gptmd mods #2446

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Conversation

trishorts
Copy link
Contributor

Recently reanalyzed modification counts in the human uniprot xml. Table below. I aimed to update the gptmd mod collection to contain all mods such that ~99% of those below were covered. After some back and forth with the previous list, I arrived at the mods in this PR. Any mods removed from the previous common biological set were placed into the less common list.

Note, I moved allysine from the uniprot ptmlist to the common biological.

Note also the big increase in the modification list used for top-down. Will post below the effect observed from this change on the jenkins top-down dataset.

Modification Count in UniProt Running Fraction
Phosphoserine 33146 0.598865361
Phosphothreonine 6013 0.70750524
N6-acetyllysine 4321 0.785574908
Phosphotyrosine 2136 0.824167088
N6-succinyllysine 1265 0.847022476
Omega-N-methylarginine 1027 0.865577799
N-acetylalanine 949 0.882723856
N-acetylmethionine 847 0.898027029
Asymmetric dimethylarginine 548 0.907928019
4-hydroxyproline 450 0.916058394
N6-(2-hydroxyisobutyryl)lysine 420 0.923646744
N-acetylserine 410 0.931054419
N6-(beta-hydroxybutyryl)lysine 277 0.936059117
N6-lactoyllysine 272 0.940973477
N6-glutaryllysine 264 0.945743297
N6-methyllysine 256 0.950368577
N6-crotonyllysine 222 0.954379562
Sulfotyrosine 151 0.957107755
Cysteine methyl ester 129 0.959438462
N6,N6-dimethyllysine 122 0.961642697
N6,N6,N6-trimethyllysine 120 0.963810797
Citrulline 117 0.965924695
4-carboxyglutamate 104 0.967803715
Hydroxyproline 104 0.969682735
Pyrrolidone carboxylic acid 96 0.971417215
N-acetylthreonine 81 0.972880682
S-nitrosocysteine 80 0.974326082
5-hydroxylysine 78 0.975735347
Dimethylated arginine 71 0.97701814
N6-butyryllysine 62 0.978138325
3'-nitrotyrosine 61 0.979240442
ADP-ribosylserine 60 0.980324492
Symmetric dimethylarginine 54 0.981300137
N6-(pyridoxal phosphate)lysine 47 0.98214931
N6-malonyllysine 45 0.982962347
Allysine 41 0.983703115
PolyADP-ribosyl glutamic acid 36 0.984353545
Deamidated asparagine 33 0.984949772
N-acetylglycine 30 0.985491797
ADP-ribosyl glutamic acid 28 0.985997687
N6-methylated lysine 28 0.986503577
(3S)-3-hydroxyasparagine 27 0.9869914
N5-methylglutamine 27 0.987479222
3-hydroxyproline 24 0.987912842
S-(2-succinyl)cysteine 23 0.988328395
N-acetylproline 22 0.98872588
5-glutamyl polyglutamate 19 0.989069162
ADP-ribosylarginine 19 0.989412445
ADP-ribosylcysteine 19 0.989755727
Iodotyrosine 17 0.990062875
3-oxoalanine (Cys) 17 0.990370022
Methionine (R)-sulfoxide 16 0.990659102
N-acetylvaline 14 0.990912047
Deamidated glutamine 13 0.991146925
(Microbial infection) O-acetylthreonine 13 0.991381802
Phenylalanine amide 13 0.99161668
Cysteine sulfenic acid (-SOH) 12 0.99183349
Pros-methylhistidine 12 0.9920503
S-glutathionyl cysteine 11 0.992249042
N-acetylcysteine 11 0.992447785
Methionine sulfoxide 11 0.992646527
5-glutamyl serotonin 11 0.99284527
N6-propionyllysine 11 0.993044012
(3R)-3-hydroxyasparagine 10 0.993224687
(Microbial infection) O-acetylserine 10 0.993405362
N6-(ADP-ribosyl)lysine 10 0.993586037
Cysteine sulfonic acid (-SO3H) 9 0.993748645
Methionine amide 9 0.993911252
Diiodotyrosine 9 0.99407386
Proline amide 8 0.9942184
2',4',5'-topaquinone 8 0.99436294
Cysteine persulfide 8 0.99450748
Tele-methylhistidine 8 0.99465202
(3R)-3-hydroxyaspartate 8 0.99479656
N,N,N-trimethylalanine 8 0.9949411
5-glutamyl dopamine 8 0.99508564
ADP-ribosyl aspartic acid 7 0.995212112
N6-lipoyllysine 6 0.995320517
Leucine amide 6 0.995428922
Tyrosine amide 6 0.995537327
N6-(retinylidene)lysine"> 6 0.995645732
S-methylcysteine 6 0.995754137
N6-(pyridoxal phosphate)lysine"> 6 0.995862542
5-glutamyl glycerylphosphorylethanolamine 6 0.995970947
N6-biotinyllysine 5 0.996061285
O-(pantetheine 4'-phosphoryl)serine 5 0.996151622
Glycine amide 5 0.99624196
Thyroxine 5 0.996332297
N6-(retinylidene)lysine 5 0.996422635
N,N,N-trimethylglycine 5 0.996512972
5-glutamyl glycine 5 0.99660331
Omega-N-methylated arginine 5 0.996693647
S-(2,3-dicarboxypropyl)cysteine 5 0.996783985
(Microbial infection) ADP-riboxanated arginine 5 0.996874322
Phosphohistidine 4 0.996946592
(Microbial infection) Phosphothreonine 4 0.997018862
(Microbial infection) Deamidated asparagine 4 0.997091132
Valine amide 4 0.997163402
Triiodothyronine 4 0.997235672
Arginine amide 4 0.997307942
ADP-ribosylasparagine 4 0.997380212
(Microbial infection) ADP-ribosylthreonine 4 0.997452482
ADP-ribosylglycine 4 0.997524752
O-AMP-tyrosine 4 0.997597022
(3S)-3-hydroxyhistidine 4 0.997669292
(Microbial infection) O-AMP-tyrosine 4 0.997741562
ADP-ribosylhistidine 3 0.997795765
(Microbial infection) Phosphoserine 3 0.997849967
Asparagine amide 3 0.99790417
Isoleucine amide 3 0.997958372
PolyADP-ribosyl aspartic acid 3 0.998012575
O-AMP-threonine 3 0.998066777
N6-carboxylysine 3 0.99812098
Aspartate 1-(chondroitin 4-sulfate)-ester"> 3 0.998175182
S-8alpha-FAD cysteine 3 0.998229385
Tele-8alpha-FAD histidine 3 0.998283587
Leucine methyl ester 3 0.99833779
(Microbial infection) O-AMP-threonine 3 0.998391992
N-acetylglutamate 3 0.998446195
(Microbial infection) O-(2-cholinephosphoryl)serine 3 0.998500397
Hypusine 3 0.9985546
Diphosphoserine 3 0.998608802
(Microbial infection) Deamidated glutamine 3 0.998663005
(Microbial infection) S-methylcysteine 3 0.998717207
3-hydroxyasparagine 3 0.99877141
4-hydroxynonenal-conjugated cysteine 3 0.998825612
(3R)-3-hydroxyarginine 2 0.998861747
Alanine amide 2 0.998897882
1-thioglycine 2 0.998934017
5-glutamyl histamine 2 0.998970152
Pyruvic acid (Ser) 2 0.999006287
Cysteine sulfinic acid (-SO2H) 2 0.999042422
Lysine amide 2 0.999078557
(3S)-3-hydroxylysine 2 0.999114692
ADP-ribosyltyrosine 2 0.999150827
N6-acetyl-N6-methyllysine 2 0.999186962
Sulfoserine 2 0.999223097
S-cysteinyl cysteine 2 0.999259232
4-hydroxynonenal-conjugated histidine 2 0.999295367
Hydroxyarginine 1 0.999313435
Glycyl adenylate 1 0.999331502
S-cysteinyl cysteine"> 1 0.99934957
Beta-decarboxylated aspartate 1 0.999367637
Glutamic acid 1-amide 1 0.999385705
Sulfocysteine 1 0.999403772
Blocked amino end (Ser) 1 0.99942184
N,N-dimethylproline 1 0.999439907
Blocked amino end (Thr)"> 1 0.999457975
N,N-dimethylglycine 1 0.999476042
N-methylglycine 1 0.99949411
S-(dipyrrolylmethanemethyl)cysteine 1 0.999512177
3'-bromotyrosine 1 0.999530245
N4,N4-dimethylasparagine 1 0.999548312
(Microbial infection) ADP-ribosyldiphthamide 1 0.99956638
Diphthamide 1 0.999584447
N6-1-carboxyethyl lysine 1 0.999602515
(3S)-3-hydroxyaspartate 1 0.999620582
N,N,N-trimethylserine 1 0.99963865
N,N-dimethylserine 1 0.999656717
N-methylserine 1 0.999674785
Blocked amino end (Ser)"> 1 0.999692852
O-acetylserine 1 0.99971092
N-acetylaspartate 1 0.999728987
(Microbial infection) ADP-ribosylasparagine 1 0.999747055
4-hydroxylysine 1 0.999765122
N-pyruvate 2-iminyl-valine 1 0.99978319
S-cGMP-cysteine 1 0.999801257
O-(2-cholinephosphoryl)serine 1 0.999819325
5-glutamyl glutamate 1 0.999837392
Thiazolidine linkage to a ring-opened DNA abasic site 1 0.99985546
(4R)-5-hydroxyleucine 1 0.999873527
(4R)-5-oxoleucine 1 0.999891595
Glutamine amide 1 0.999909662
O-AMP-serine 1 0.99992773
Pyrrolidone carboxylic acid (Glu) 1 0.999945797
(Microbial infection) N6-acetyllysine 1 0.999963865
4-hydroxynonenal-conjugated lysine 1 0.999981932
2,3-didehydroalanine (Ser) 1 1

@trishorts
Copy link
Contributor Author

original jenkins top down
image

@trishorts
Copy link
Contributor Author

This PR
image

Copy link
Member

@nbollis nbollis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm.

@nbollis nbollis self-requested a review December 5, 2024 17:54
Copy link

codecov bot commented Dec 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.81%. Comparing base (b82cfaf) to head (31d75bd).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2446   +/-   ##
=======================================
  Coverage   93.81%   93.81%           
=======================================
  Files         141      141           
  Lines       21993    21993           
  Branches     3014     3014           
=======================================
  Hits        20633    20633           
  Misses        910      910           
  Partials      450      450           

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants