-
Notifications
You must be signed in to change notification settings - Fork 47
/
descriptions.py
3343 lines (3313 loc) · 175 KB
/
descriptions.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# This file is part of the `pypath` python module
#
# Copyright 2014-2023
# EMBL, EMBL-EBI, Uniklinik RWTH Aachen, Heidelberg University
#
# Authors: see the file `README.rst`
# Contact: Dénes Türei ([email protected])
#
# Distributed under the GPLv3 License.
# See accompanying file LICENSE.txt or copy at
# https://www.gnu.org/licenses/gpl-3.0.html
#
# Website: https://pypath.omnipathdb.org/
#
import sys
from future.utils import iteritems
import codecs
import bs4
import textwrap
import pypath.omnipath.server._html as _html
import pypath.resources.data_formats as data_formats
import pypath.resources.urls as urls
import pypath.share.session as session_mod
__all__ = ['descriptions', 'gen_html', 'write_html']
if 'long' not in __builtins__:
long = int
if 'unicode' not in __builtins__:
unicode = str
_logger = session_mod.Logger(name = 'descriptions')
_log = _logger._log
descriptions = {
'HuRI': {
'year': 2016,
'releases': [2012, 2014, 2016],
'recommend':
'very large, quality controlled, unbiased yeast-2-hybrid screening',
'label': 'HuRI HI-III',
'full_name': 'Human Reference Interactome',
'urls': {
'articles':
['http://www.cell.com/cell/abstract/S0092-8674(14)01422-6'],
'webpages': [
'http://interactome.dfci.harvard.edu/H_sapiens/',
'http://www.interactome-atlas.org/',
],
},
'pubmeds': [25416956],
'emails':
[('[email protected]', 'Michael Calderwood')],
'type': 'high-throughput',
'subtype': 'yeast 2 hybrid',
'omnipath': False,
'license': {
'name':
'No license. "This dataset is freely available to the research community through the search engine or via download."',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
}
},
'HuRI Lit-BM': {
'year': 2017,
'releases': [2013, 2017],
'label': 'HuRI Lit-BM-17',
'full_name': 'Human Reference Interactome Literature Benchmark',
'urls': {
'articles':
['http://www.cell.com/cell/abstract/S0092-8674(14)01422-6'],
'webpages': [
'http://interactome.dfci.harvard.edu/H_sapiens/',
'http://www.interactome-atlas.org/',
],
},
'authors': ['CCSB'],
'pubmeds': [25416956],
'descriptions': [
u'''
High-quality non-systematic Literature dataset. In 2013, we extracted interaction data from BIND, BioGRID, DIP, HPRD, MINT, IntAct, and PDB to generate a high-quality binary literature dataset comprising ~11,000 protein-protein interactions that are binary and supported by at least two traceable pieces of evidence (publications and/or methods) (Rolland et al Cell 2014). Although this dataset does not result from a systematic investigation of the interactome search space and should thus be used with caution for any network topology analyses, it represents valuable interactions for targeted studies and is freely available to the research community through the search engine or via download.
'''
],
'emails':
[('[email protected]', 'Michael Calderwood')],
'type': 'high-throughput',
'subtype': 'yeast 2 hybrid',
'omnipath': False,
'pypath': {
'get': ['pypath.dataio.get_lit_bm_13()'],
'data': ['pypath.urls.urls[\'hid\'][\'lit-bm-13\']'],
'input': ['pypath.data_formats.interaction_misc[\'lit13\']']
},
'license': {
'name':
'No license. "This dataset is freely available to the research community through the search engine or via download."',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
}
},
'ELM': {
'year': 2014,
'releases': [2003, 2008, 2009, 2012, 2013, 2014, 2016],
'size': {
'nodes': None,
'edges': None
},
'authors': ['ELM Consortium'],
'label': 'ELM',
'color': '',
'urls': {
'webpages': ['http://elm.eu.org/'],
'articles': [
'http://nar.oxfordjournals.org/content/40/D1/D242.long',
'http://nar.oxfordjournals.org/content/42/D1/D259.long',
'http://nar.oxfordjournals.org/content/44/D1/D294.long'
],
'omictools':
['http://omictools.com/eukaryotic-linear-motif-resource-tool']
},
'annot': ['domain', 'residue'],
'recommend':
'structural details: domain-motif relationships; very high confidence',
'pubmeds': [22110040, 24214962, 26615199],
'emails': [('[email protected]', 'ELM Team'), ('[email protected]',
'Toby Gibson')],
'type': 'literature curated',
'subtype': 'post-translational modifications',
'data_integration': 'dynamic',
'descritpions': [
u'''
Ideally, each motif class has multiple example instances of this motif annotated, whereby an instance is described as a match to the regular expression pattern of the ELM motif class in a protein sequence. For each instance entry, ideally, multiple sources of experimental evidence are recorded (identifying participant, detecting motif presence and detecting interaction), and, following annotation best practices, a reliability score is given by the annotator.
'''
],
'omnipath': True,
'license': {
'name': 'ELM Software License Agreement, non-free',
'url': 'http://elm.eu.org/media/Elm_academic_license.pdf',
'commercial_use': False
},
'pypath': {
'data': [
'pypath.urls.urls[\'ielm_domains\'][\'url\']',
'pypath.urls.urls[\'elm_class\'][\'url\']',
'pypath.urls.urls[\'elm_inst\'][\'url\']',
'urls.urls[\'elm_int\'][\'url\']'
],
'format': [
'pypath.data_formats.ptm[\'elm\']',
'pypath.data_formats.omnipath[\'elm\']'
],
'input': [
'pypath.dataio.get_elm_domains()',
'pypath.dataio.get_elm_classes()',
'pypath.dataio.get_elm_instances()'
],
'intr': ['pypath.dataio.get_elm_interactions()'],
'dmi': ['pypath.pypath.PyPath.load_elm()']
}
},
'LMPID': {
'year': 2015,
'releases': [2015],
'size': {
'nodes': None,
'edges': None
},
'authors': ['Bose Institute'],
'label': 'LMPID',
'color': '',
'urls': {
'webpages':
['http://bicresources.jcbose.ac.in/ssaha4/lmpid/index.php'],
'articles':
['http://database.oxfordjournals.org/content/2015/bav014.long'],
'omictools': [
'http://omictools.com/linear-motif-mediated-protein-interaction-database-tool'
]
},
'pubmeds': [25776024],
'emails': [('[email protected]', 'Sudipto Saha')],
'type': 'literature curated',
'subtype': 'post-translational modifications',
'recommend':
'structural details: domain-motif relationships; similar to ELM, but more recent and larger',
'annot': ['domain', 'mechanism'],
'descriptions': [
u'''
LMPID (Linear Motif mediated Protein Interaction Database) is a manually curated database which provides comprehensive experimentally validated information about the LMs mediating PPIs from all organisms on a single platform. About 2200 entries have been compiled by detailed manual curation of PubMed abstracts, of which about 1000 LM entries were being annotated for the first time, as compared with the Eukaryotic LM resource.
'''
],
'omnipath': True,
'pypath': {
'input': ['pypath.dataio.load_lmpid()'],
'intr': ['pypath.dataio.lmpid_interactions()'],
'data': ['pypath.data/LMPID_DATA_pubmed_ref.xml'],
'format': [
'pypath.data_formats.ptm[\'lmpid\']',
'pypath.data_formats.omnipath[\'lmpid\']'
],
'dmi': [
'pypath.dataio.lmpid_dmi()',
'pypath.pypath.PyPath().process_dmi(source = \'LMPID\')'
]
},
'license': {
'name':
'No license. If you are using this database please cite Sarkar 2015.',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
}
},
'PDZBase': {
'year': 2004,
'releases': [2004],
'authors': ['Weinstein Group'],
'urls': {
'webpages': ['http://abc.med.cornell.edu/pdzbase'],
'articles': [
'http://bioinformatics.oxfordjournals.org/content/21/6/827.long'
],
'omictools': ['http://omictools.com/pdzbase-tool']
},
'pubmeds': [15513994],
'taxons': ['human'],
'color': None,
'label': 'PDZBase',
'annot': ['domain'],
'recommend':
'a handful specific interactions for proteins with PDZ domain',
'descriptions': [
u'''
PDZBase is a database that aims to contain all known PDZ-domain-mediated protein-protein interactions. Currently, PDZBase contains approximately 300 such interactions, which have been manually extracted from >200 articles.
PDZBase currently contains ∼300 interactions, all of which have been manually extracted from the literature, and have been independently verified by two curators. The extracted information comes from in vivo (co-immunoprecipitation) or in vitro experiments (GST-fusion or related pull-down experiments). Interactions identified solely from high throughput methods (e.g. yeast two-hybrid or mass spectrometry) were not included in PDZBase. Other prerequisites for inclusion in the database are: that knowledge of the binding sites on both interacting proteins must be available (for instance through a truncation or mutagenesis experiment); that interactions must be mediated directly by the PDZ-domain, and not by any other possible domain within the protein.
'''
],
'emails': [('[email protected]', 'Harel Weinstein'),
('[email protected]', 'PDZBase Team')],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': True,
'pypath': {
'intr': ['pypath.dataio.get_pdzbase()'],
'data': [
'pypath.urls.urls[\'pdzbase\']',
'pypath.urls.urls[\'pdz_details\']'
],
'format': [
'pypath.data_formats.pathway[\'pdz\']',
'pypath.data_formats.omnipath[\'pdz\']'
]
},
'license': {
'name': 'No license.',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
},
'pathguide': 160
},
'Guide2Pharma': {
'year': 2015,
'releases': [2007, 2008, 2009, 2011, 2013, 2014, 2015, 2016],
'size': None,
'authors': None,
'label': 'Guide to Pharmacology',
'full_name': 'Guide to Pharmacology',
'color': None,
'pubmeds': [24234439],
'urls': {
'webpages': ['http://www.guidetopharmacology.org/'],
'articles': [
'http://nar.oxfordjournals.org/content/42/D1/D1098.long',
'http://onlinelibrary.wiley.com/doi/10.1111/j.1476-5381.2011.01649_1.x/full'
],
'omictools': [
'http://omictools.com/international-union-of-basic-and-clinical-pharmacology-british-pharmacological-society-guide-to-pharmacology-tool'
]
},
'recommend':
'one of the strongest in ligand-receptor interactions; still does not contain everything, therefore worth to combine with larger activity flow resources like Signor',
'descriptions': [
u'''
Presently, the resource describes the interactions between target proteins and 6064 distinct ligand entities (Table 1). Ligands are listed against targets by their action (e.g. activator, inhibitor), and also classified according to substance types and their status as approved drugs. Classes include metabolites (a general category for all biogenic, non-peptide, organic molecules including lipids, hormones and neurotransmitters), synthetic organic chemicals (e.g. small molecule drugs), natural products, mammalian endogenous peptides, synthetic and other peptides including toxins from non-mammalian organisms, antibodies, inorganic substances and other, not readily classifiable compounds.
The new database was constructed by integrating data from IUPHAR-DB and the published GRAC compendium. An overview of the curation process is depicted as an organizational flow chart in Figure 2. New information was added to the existing relational database behind IUPHAR-DB and new webpages were created to display the integrated information. For each new target, information on human, mouse and rat genes and proteins, including gene symbol, full name, location, gene ID, UniProt and Ensembl IDs was manually curated from HGNC, the Mouse Genome Database (MGD) at Mouse Genome Informatics (MGI), the Rat Genome Database (RGD), UniProt and Ensembl, respectively. In addition, ‘Other names’, target-specific fields such as ‘Principal transduction’, text from the ‘Overview’ and ‘Comments’ sections and reference citations (downloaded from PubMed; http://www.ncbi.nlm.nih.gov/pubmed) were captured from GRAC and uploaded into the database against a unique Object ID.
'''
],
'emails':
[('[email protected]', 'Guide to Pharmacology Team'),
('[email protected]', 'Cristopher Southan')],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': True,
'license': {
'name': 'CC-Attribution-ShareAlike-3.0',
'url': 'http://creativecommons.org/licenses/by-sa/3.0/',
'commercial_use': True
},
'pypath': {
'data': ['pypath.data_formats'],
'format': [
'pypath.data_formats.pathway[\'guide2pharma\']',
'pypath.data_formats.omnipath[\'guide2pharma\']'
]
},
'pathguide': 345
},
'phosphoELM': {
'year': 2010,
'releases': [2004, 2007, 2010],
'urls': {
'webpages': ['http://phospho.elm.eu.org/'],
'articles': [
'http://www.biomedcentral.com/1471-2105/5/79',
'http://nar.oxfordjournals.org/content/36/suppl_1/D240.full',
'http://nar.oxfordjournals.org/content/39/suppl_1/D261'
],
'omictools': ['http://omictools.com/phospho-elm-tool']
},
'pubmeds': [15212693, 17962309, 21062810],
'annot': ['mechanism', 'residue'],
'recommend':
'one of the largest kinase-substrate databases; substantial number of specific proteins and interactions, with more receptors than PhosphoSite',
'descriptions': [
u'''
Phospho.ELM http://phospho.elm.eu.org is a new resource containing experimentally verified phosphorylation sites manually curated from the literature and is developed as part of the ELM (Eukaryotic Linear Motif) resource. Phospho.ELM constitutes the largest searchable collection of phosphorylation sites available to the research community. The Phospho.ELM entries store information about substrate proteins with the exact positions of residues known to be phosphorylated by cellular kinases. Additional annotation includes literature references, subcellular compartment, tissue distribution, and information about the signaling pathways involved as well as links to the molecular interaction database MINT. Phospho.ELM version 2.0 contains 1,703 phosphorylation site instances for 556 phosphorylated proteins. (Diella 2004)
''', u'''
Phospho.ELM is a manually curated database of eukaryotic phosphorylation sites. The resource includes data collected from published literature as well as high-throughput data sets. The current release of the Phospho.ELM data set (version 7.0, July 2007) contains 4078 phospho-protein sequences covering 12,025 phospho-serine, 2,362 phospho-threonine and 2,083 phospho-tyrosine sites with a total of 16,470 sites.
For each phospho-site we report if the phosphorylation evidence has been identified by small-scale analysis (low throughput; LTP) that typically focus on one or a few proteins at a time or by large-scale experiments (high throughput; HTP), which mainly apply MS techniques. It is noteworthy that in our data set there is a small overlap between instances identified by LTP and HTP experiments. (Diella 2007)
''', u'''
The current release of the Phospho.ELM data set (version 9.0) contains more than 42,500 non-redundant instances of phosphorylated residues in more than 11,000 different protein sequences (3370 tyrosine, 31 754 serine and 7449 threonine residues). For each phosphosite we report whether the phosphorylation evidence has been identified by small-scale analyses (low-throughput, LTP) and/or by large-scale experiments (high-throughput, HTP), which mainly apply MS techniques. The majority of the protein instances from Phospho. ELM are vertebrate (mostly Homo sapiens (62%) and Mus musculus (16%)) though 22% are from other species, mainly Drosophila melanogaster (13%) and Caenorhabditis elegans (7%). In total, more than 300 different kinases have been annotated and a document providing additional information about all kinases annotated in Phospho.ELM can be found at http://phospho.elm.eu.org/kinases.html. (Dinkel 2010)
'''
],
'emails': [('[email protected]', 'Toby Gibson')],
'type': 'Literature curated',
'data_integration': 'dynamic',
'subtype': 'PTM',
'label': 'phospho.ELM',
'omnipath': True,
'license': {
'name': 'phospho.ELM Academic License, non-free',
'url':
'http://phospho.elm.eu.org/dumps/Phospho.Elm_AcademicLicense.pdf',
'commercial_use': False
},
'pypath': {
'format': [
'pypath.data_formats.ptm[\'phelm\']',
'pypath.data_formats.omnipath[\'phelm\']'
],
'data': [
'pypath.urls.urls[\'p_elm\'][\'psites\']',
'urls.urls[\'p_elm_kin\'][\'url\']'
],
'intr': ['pypath.dataio.phelm_interactions()'],
'input': [
'pypath.dataio.get_phosphoelm()',
'pypath.dataio.get_phelm_kinases()',
'pypath.dataio.phelm_psites()'
],
'ptm': ['pypath.pypath.PyPath().load_phosphoelm()']
},
'pathguide': 138
},
'DOMINO': {
'year': 2006,
'releases': [2006],
'authors': ['Cesareni Group'],
'urls': {
'webpages':
['http://mint.bio.uniroma2.it/domino/search/searchWelcome.do'],
'articles':
['http://nar.oxfordjournals.org/content/35/suppl_1/D557.long'],
'omictools': ['http://omictools.com/domino-tool']
},
'pubmeds': [17135199],
'taxons': [
'human', 'yeast', 'C. elegans', 'mouse', 'rat', 'HIV',
'D. melanogaster', 'A. thaliana', 'X. laevis', 'B. taurus',
'G. gallus', 'O. cuniculus', 'Plasmodium falciparum'
],
'annot': ['experiment'],
'recommend':
'rich details and many specific information; discontinued, Signor from the same lab is larger and newer, and contains most of its data',
'descriptions': [
u'''
DOMINO aims at annotating all the available information about domain-peptide and domain–domain interactions. The core of DOMINO, of July 24, 2006 consists of more than 3900 interactions extracted from peer-reviewed articles and annotated by expert biologists. A total of 717 manuscripts have been processed, thus covering a large fraction of the published information about domain–peptide interactions. The curation effort has focused on the following domains: SH3, SH2, 14-3-3, PDZ, PTB, WW, EVH, VHS, FHA, EH, FF, BRCT, Bromo, Chromo and GYF. However, interactions mediated by as many as 150 different domain families are stored in DOMINO.
''', u'''
DOMINO is an open-access database comprising more than 3900 annotated experiments describing interactions mediated by protein-interaction domains. The curation effort aims at covering the interactions mediated by the following domains (SH3, SH2, 14-3-3, PDZ, PTB, WW, EVH, VHS, FHA, EH, FF, BRCT, Bromo, Chromo, GYF). However, interactions mediated by as many as 150 different domain families are stored in DOMINO.
''', u'''
The curation process follows the PSI-MI 2.5 standard but with special emphasis on the mapping of the interaction to specific protein domains of both participating proteins. This is achieved by paying special attention to the shortest protein fragment that was experimentally verified as sufficient for the interaction. Whenever the authors report only the name of the domain mediating the interaction (i.e. SH3, SH2 ...), without stating the coordinates of the experimental binding range, the curator may choose to enter the coordinates of the Pfam domain match in the protein sequence. Finally whenever the information is available, any mutation or posttranslational modification affecting the interaction affinity is noted in the database.
'''
],
'emails': [('[email protected]', 'Gianni Cesareni')],
'type': 'Literature curated',
'subtype': 'PTM',
'omnipath': True,
'license': {
'name': 'CC-Attribution-2.5',
'url': 'http://creativecommons.org/licenses/by/2.5',
'commercial_use': True
},
'pypath': {
'data': ['pypath.urls.urls[\'domino\'][\'url\']'],
'input':
['pypath.dataio.get_domino()', 'pypath.dataio.get_domino_ddi()'],
'format': [
'pypath.data_formats.ptm[\'domino\']',
'pypath.data_formats.omnipath[\'domino\']'
],
'intr': ['pypath.dataio.domino_interactions()'],
'ptm': ['pypath.dataio.get_domino_ptms()'],
'dmi': [
'pypath.dataio.get_domino_dmi()',
'pypath.pypath.PyPath().load_domino_dmi()'
],
'ddi': ['pypath.dataio.get_domino_ddi()']
},
'pathguide': 239
},
'dbPTM': {
'year': 2015,
'releases': [2005, 2009, 2012, 2015],
'authors': ['ISBLab'],
'urls': {
'webpages': ['http://dbptm.mbc.nctu.edu.tw/'],
'articles': [
'http://nar.oxfordjournals.org/content/41/D1/D295.long',
'http://www.biomedcentral.com/1756-0500/2/111',
'http://nar.oxfordjournals.org/content/34/suppl_1/D622.long',
'http://nar.oxfordjournals.org/content/44/D1/D435.long'
],
'omictools': ['http://omictools.com/dbptm-tool']
},
'pubmeds': [16381945, 19549291, 23193290, 26578568],
'taxons': ['human', 'Metazoa', 'Bacteria', 'plants', 'yeast'],
'annot': ['mechanism', 'residue'],
'recommend':
'integrates many small efforts; beside phosphorylations provides all types of PTMs and enzyme-substrate relationships',
'descriptions': [
u'''
Due to the inaccessibility of database contents in several online PTM resources, a total 11 biological databases related to PTMs are integrated in dbPTM, including UniProtKB/SwissProt, version 9.0 of Phospho.ELM, PhosphoSitePlus, PHOSIDA, version 6.0 of O-GLYCBASE, dbOGAP, dbSNO, version 1.0 of UbiProt, PupDB, version 1.1 of SysPTM and release 9.0 of HPRD.
With the high throughput of MS-based methods in post-translational proteomics, this update also includes manually curated MS/MS-identified peptides associated with PTMs from research articles through a literature survey. First, a table list of PTM-related keywords is constructed by referring to the UniProtKB/SwissProt PTM list (http://www.uniprot.org/docs/ptmlist.txt) and the annotations of RESID (28). Then, all fields in the PubMed database are searched based on the keywords of the constructed table list. This is then followed by downloading the full text of the research articles. For the various experiments of proteomic identification, a text-mining system is developed to survey full-text literature that potentially describes the site-specific identification of modified sites. Approximately 800 original and review articles associated with MS/MS proteomics and protein modifications are retrieved from PubMed (July 2012). Next, the full-length articles are manually reviewed for precisely extracting the MS/MS peptides along with the modified sites. Furthermore, in order to determine the locations of PTMs on a full-length protein sequence, the experimentally verified MS/MS peptides are then mapped to UniProtKB protein entries based on its database identifier (ID) and sequence identity. In the process of data mapping, MS/MS peptides that cannot align exactly to a protein sequence are discarded. Finally, each mapped PTM site is attributed with a corresponding literature (PubMed ID).
''', u'''
dbPTM was developed as a comprehensive database of experimentally verified PTMs from several databases with annotations of potential PTMs for all UniProtKB protein entries. For this tenth anniversary of dbPTM, the updated resource includes not only a comprehensive dataset of experimentally verified PTMs, supported by the literature, but also an integrative interface for accessing all available databases and tools that are associated with PTM analysis. As well as collecting experimental PTM data from 14 public databases, this update manually curates over 12,000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation, from approximately 500 research articles, which were retrieved by text mining.
'''
],
'emails': [('[email protected]', 'Hsien-Da Huang'),
('[email protected]', 'Hsien-Da Huang')],
'type': 'Literature curated',
'subtype': 'PTM',
'omnipath': True,
'data_integration': 'dynamic',
'license': {
'name': 'No license',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
},
'pypath': {
'format': [
'pypath.data_formats.ptm[\'dbptm\']',
'pypath.data_formats.omnipath[\'dbptm\']'
],
'data': [
'pypath.urls.urls[\'dbptm_benchmark\'][\'urls\']',
'pypath.urls.urls[\'dbptm\'][\'urls\']'
],
'input': ['pypath.dataio.get_dbptm()'],
'intr': ['pypath.dataio.dbptm_interactions()'],
'ptm': ['pypath.pypath.PyPath().load_dbptm()']
}
},
'SIGNOR': {
'year': 2015,
'releases': [2015,2019],
'urls': {
'webpages': ['http://signor.uniroma2.it/'],
'articles': [
'http://nar.oxfordjournals.org/content/44/D1/D548',
'http://f1000research.com/posters/1098359'
],
'omictools':
['http://omictools.com/signaling-network-open-resource-tool']
},
'full_name': 'Signaling Network Open Resource',
'pubmeds': [26467481],
'annot': ['mechanism', 'pathway'],
'recommend':
'provides effect sign for an unprecedented number of interactions; large and recent curation effort; many specific entities; PTMs with enzymes',
'descriptions': [
u'''
SIGNOR, the SIGnaling Network Open Resource, organizes and stores in a structured format signaling information published in the scientific literature. The captured information is stored as binary causative relationships between biological entities and can be represented graphically as activity flow. The entire network can be freely downloaded and used to support logic modeling or to interpret high content datasets. The core of this project is a collection of more than 11000 manually-annotated causal relationships between proteins that participate in signal transduction. Each relationship is linked to the literature reporting the experimental evidence. In addition each node is annotated with the chemical inhibitors that modulate its activity. The signaling information is mapped to the human proteome even if the experimental evidence is based on experiments on mammalian model organisms.
'''
],
'authors': ['Cesareni Group'],
'label': 'SIGNOR',
'color': '',
'data_import': ['SignaLink3', 'PhosphoSite'],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': True,
'emails': [('[email protected]', 'Livia Perfetto')],
'license': {
'name': 'CC-Attribution-ShareAlike 4.0',
'url': 'https://creativecommons.org/licenses/by-sa/4.0/',
'commercial_use': True
},
'pypath': {
'data': ['pypath.urls.urls[\'signor\'][\'all_url\']'],
'format': ['pypath.data_formats.pathway[\'signor\']'],
'intr': ['pypath.dataio.signor_interactions()'],
'ptm': [
'pypath.dataio.load_signor_ptms()',
'pypath.pypath.PyPath().load_signor_ptms()'
]
}
},
'HuPho': {
'year': 2015,
'releases': [2012, 2015],
'urls': {
'webpages': ['http://hupho.uniroma2.it/'],
'articles': [
'http://onlinelibrary.wiley.com/doi/10.1111/j.1742-4658.2012.08712.x/full'
],
'omictools':
['http://omictools.com/human-phosphatase-portal-tool']
},
'pubmeds': [22804825],
'descriptions': [
u'''
In order to offer a proteome-wide perspective of the phosphatase interactome, we have embarked on an extensive text-mining-assisted literature curation effort to extend phosphatase interaction information that was not yet covered by protein–protein interaction (PPI) databases. Interaction evidence captured by expert curators was annotated in the protein interaction database MINT according to the rapid curation standard. This data set was next integrated with protein interaction information from three additional major PPI databases, IntAct, BioGRID and DIP. These databases are part of the PSIMEx consortium and adopt a common data model and common controlled vocabularies, thus facilitating data integration. Duplicated entries were merged and redundant interactions have been removed.
As a result, from the HuPho website it is possible to explore experimental evidence from 718 scientific articles reporting 4600 experiments supporting protein interactions where at least one of the partners is a phosphatase. Since some interactions are supported by more than one piece of evidence, the actual number of non-redundant interactions is smaller, 2500 at the time of writing this paper. Moreover, 199 phosphatases have at least one reported ligand, while 53 have none. Interaction evidence is fairly evenly distributed in the four PSIMEx resources suggesting a substantial lack of overlap among the data curated by each database.
'''
],
'notes': [
u'''
The database is dynamically updated, so is up to date at any given time. That's why it is marked as up to date in 2015, despite it has no new release after 2012.
'''
],
'authors': ['Cesareni Group'],
'label': 'HuPho',
'full_name': 'Human Phosphatase Portal',
'color': '',
'type': 'high throughput and literature curated',
'subtype': 'post-translational modification',
'omnipath': False,
'emails': [('[email protected]', 'Livia Perfetto')],
'license': {
'name': 'No license',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
}
},
'SignaLink3': {
'year': 2015,
'releases': [2010, 2012, 2016],
'size': 0,
'authors': ['NetBiol Group'],
'label': 'SignaLink',
'color': '',
'pubmeds': [20542890, 23331499],
'urls': {
'webpages': ['http://signalink.org/'],
'articles': [
'http://bioinformatics.oxfordjournals.org/content/26/16/2042.long',
'http://www.biomedcentral.com/1752-0509/7/7'
],
'omictools': ['http://omictools.com/signalink-tool']
},
'taxons': ['human', 'D. melanogaster', 'C. elegans'],
'annot': ['pathway'],
'recommend':
'one of the largest resources with effect sign; due to its specific, biochemically defined pathways suitable for cross-talk analysis',
'descriptions': [
u'''
In each of the three organisms, we first listed signaling proteins and interactions from reviews (and from WormBook in C.elegans) and then added further signaling interactions of the listed proteins. To identify additional interactions in C.elegans, we examined all interactions (except for transcription regulation) of the signaling proteins listed in WormBase and added only those to SignaLink that we could manually identify in the literature as an experimentally verified signaling interaction. For D.melanogaster, we added to SignaLink those genetic interactions from FlyBase that were also reported in at least one yeast-2-hybrid experiment. For humans, we manually checked the reliability and directions for the PPIs found with the search engines iHop and Chilibot.
SignaLink assigns proteins to signaling pathways using the full texts of pathway reviews (written by pathway experts). While most signaling resources consider 5–15 reviews per pathway, SignaLink uses a total of 170 review papers, i.e. more than 20 per pathway on average. Interactions were curated from a total of 941 articles (PubMed IDs are available at the website). We added a small number of proteins based on InParanoid ortholog clusters. For curation, we used a self-developed graphical tool and Perl/Python scripts. The current version of SignaLink was completed in May 2008 based on WormBase (version 191), FlyBase (2008.6), Ensembl, UniProt and the publications listed on the website.
The curation protocol of SignaLink (Fig. 1A) contains several steps aimed specifically at reducing data and curation errors. We used reviews as a starting point, manually looked up interactions three times, and manually searched for interactions of known signaling proteins with no signaling interactions so far in the database.
'''
],
'notes': [
u'''
For OmniPath we used the literature curated part of version 3 of SignaLink, which is unpublished yet. Version 2 is publicly available, and format definitions in pypath exist to load the version 2 alternatively.
'''
],
'emails': [('[email protected]', 'Tamas Korcsmaros'),
('[email protected]', 'Tamas Korcsmaros')],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': True,
'license': {
'name': 'CC-Attribution-NonCommercial-ShareAlike-3.0',
'url': 'http://creativecommons.org/licenses/by-nc-sa/3.0/',
'commercial_use': False
},
'pathguide': 320
},
'NRF2ome': {
'year': 2013,
'releases': [2013],
'size': {
'nodes': None,
'edges': None,
},
'authors': ['NetBiol Group'],
'label': 'NRF2ome',
'color': '',
'urls': {
'webpages': ['http://nrf2.elte.hu/'],
'articles': [
'http://www.hindawi.com/journals/omcl/2013/737591/',
'http://www.sciencedirect.com/science/article/pii/S0014579312003912'
]
},
'pubmeds': [22641035, 23710289],
'taxons': ['human'],
'recommend':
'specific details about NRF2 related oxidative stress signaling; connections to transcription factors',
'descriptions': [
u'''
From Korcsmaros 2010: ... we first listed signaling proteins and interactions from reviews and then added further signaling interactions of the listed proteins. We used reviews as a starting point, manually looked up interactions three times, and manually searched for interactions of known signaling proteins with no signaling interactions so far in the database.
'''
],
'emails': [('[email protected]', 'Tamas Korcsmaros'),
('[email protected]', 'Tamas Korcsmaros')],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': True,
'license': {
'name': 'CC-Attribution-NonCommercial-ShareAlike-3.0',
'url': 'http://creativecommons.org/licenses/by-nc-sa/3.0/',
'commercial_use': False
}
},
'ARN': {
'year': 2014,
'releases': [2014],
'size': 0,
'authors': ['NetBiol Group'],
'label': 'ARN',
'color': '',
'pubmeds': [25635527],
'urls': {
'webpages': ['http://autophagyregulation.org/'],
'articles': [
'http://www.tandfonline.com/doi/full/10.4161/15548627.2014.994346'
]
},
'taxons': ['human'],
'annot': ['pathway'],
'recommend':
'well curated essential interactions in autophagy regulation; connections to transcription factors',
'descriptions': [
u'''
From Korcsmaros 2010: ... we first listed signaling proteins and interactions from reviews and then added further signaling interactions of the listed proteins. We used reviews as a starting point, manually looked up interactions three times, and manually searched for interactions of known signaling proteins with no signaling interactions so far in the database.
'''
],
'emails': [('[email protected]', 'Tamas Korcsmaros'),
('[email protected]', 'Tamas Korcsmaros')],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': True,
'license': {
'name': 'CC-Attribution-NonCommercial-ShareAlike-3.0',
'url': 'http://creativecommons.org/licenses/by-nc-sa/3.0/',
'commercial_use': False
}
},
'HPRD': {
'year': 2010,
'releases': [2002, 2005, 2009, 2010],
'urls': {
'webpages': ['http://www.hprd.org/'],
'articles': [
'http://genome.cshlp.org/content/13/10/2363.long',
'http://nar.oxfordjournals.org/content/34/suppl_1/D411.long',
'http://nar.oxfordjournals.org/content/37/suppl_1/D767.long'
],
'omictools':
['http://omictools.com/human-protein-reference-database-tool']
},
'annot': ['mechanism'],
'recommend':
'one of the largest kinase-substrate resources; provides large amount of specific information; discontinued',
'pubmeds': [14525934, 16381900, 18988627],
'descriptions': [
u'''
The information about protein-protein interactions was cataloged after a critical reading of the published literature. Exhaustive searches were done based on keywords and medical subject headings (MeSH) by using Entrez. The type of experiments that served as the basis for establishing protein-protein interactions was also annotated. Experiments such as coimmunoprecipitation were designated in vivo, GST fusion and similar “pull-down” type of experiments were designated in vitro, and those identified by yeast two-hybrid were annotated as yeast two-hybrid.
Posttranslational modifications were annotated based on the type of modification, site of modification, and the modified residue. In addition, the upstream enzymes that are responsible for modifications of these proteins were reported if described in the articles. The most commonly known and the alternative subcellular localization of the protein were based on the literature. The sites of expression of protein and/or mRNA were annotated based on published studies.
'''
],
'full_name': 'Human Protein Reference Database',
'emails': [('[email protected]', 'Akhilesh Pandey')],
'type': 'literature curated',
'subtype': 'post-translational modification',
'omnipath': True,
'license': {
'name':
'No license. Everything in HPRD is free as long as it is not used for commercial purposes. Commercial entitites will have to pay a fee under a licensing arrangement which will be used to make this database even better. Commercial users should send an e-mail for details. This model of HPRD is similar to the SWISS-PROT licensing arrangement. We do not have any intentions to profit from HPRD. Our goal is to promote science by creating the infrastructure of HPRD. We hope to keep it updated with the assistance of the entire biomedical community. Any licensing fee, if generated, will be used to annotate HPRD better and to add more entries and features.',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
},
'pypath': {
'data': [
'urls.urls[\'hprd_all\'][\'url\']',
'urls.urls[\'hprd_all\'][\'ptm_file\']'
],
'input': ['pypath.dataio.get_hprd()'],
'intr': ['pypath.dataio.hprd_interactions()'],
'format': [
'pypath.data_formats.ptm[\'hprd\']',
'pypath.data_formats.omnipath[\'hprd\']'
],
'ptm': ['pypath.dataio.get_hprd_ptms()']
},
'pathguide': 14
},
'ACSN': {
'year': 2015,
'releases': [2008, 2014, 2015, 2016],
'authors': ['Curie'],
'urls': {
'webpages': ['https://acsn.curie.fr'],
'articles': [
'http://www.nature.com/oncsis/journal/v4/n7/full/oncsis201519a.html',
'http://msb.embopress.org/content/4/1/0174.long'
]
},
'pubmeds': [26192618, 18319725],
'taxons': ['human'],
'recommend':
'the third largest process description resource; focused on signaling pathways; relationships mapped into a topological space',
'descriptions': [
u'''
The map curator studies the body of literature dedicated to the biological process or molecular mechanism of interest. The initial sources of information are the major review articles from high-impact journals that represent the consensus view on the studied topic and also provide a list of original references. The map curator extracts information from review papers and represents it in the form of biochemical reactions in CellDesigner. This level of details reflects the ‘canonical’ mechanisms. Afterwards, the curator extends the search and analyses original papers from the list provided in the review articles and beyond. This information is used to enrich the map with details from the recent discoveries in the field. The rule for confident acceptance and inclusion of a biochemical reaction or a process is the presence of sufficient evidences from more than two studies, preferably from different scientific groups. The content of ACSN is also verified and compared with publicly available databases such as REACTOME, KEGG, WikiPathways, BioCarta, Cell Signalling and others to ensure comprehensive representation of consensus pathways and links on PMIDs of original articles confirmed annotated molecular interactions.
''', u'''
CellDesigner 3.5 version was used to enter biological facts from a carefully studied selection of papers (see the whole bibliography on the web site with Supplementary information). Whenever the details of a biological fact could not be naturally expressed with CellDesigner standard notations, it was fixed and some solution was proposed. For example, we added a notation (co‐factor) to describe all the components intervening in the transcription of genes mediated by the E2F family proteins.
'''
],
'emails': [('[email protected]', 'Andrei Zinovyev')],
'type': 'literature curated',
'subtype': 'reaction',
'full_name': 'Atlas of Cancer Signalling Networks',
'omnipath': False,
'license': {
'name': 'CC-Attribution 4.0',
'url': 'https://creativecommons.org/licenses/by/4.0/',
'commercial_use': True
},
'pypath': {
'data': [
'pypath.urls.urls[\'acsn\'][\'url\']',
'pypath.data_formats.files[\'acsn\'][\'ppi\']',
'pypath.data_formats.files[\'acsn\'][\'names\']',
'pypath.urls.urls[\'acsn\'][\'biopax_l3\']'
],
'input': [
'pypath.dataio.get_acsn()', 'pypath.dataio.get_acsn_effects()',
'pypath.dataio.acsn_biopax()',
'pypath.pypath.PyPath().acsn_effects()'
],
'intr': ['pypath.dataio.acsn_ppi()'],
'format': [
'pypath.data_formats.reaction[\'acsn\']',
'pypath.data_formats.reaction_misc[\'acsn\']'
]
}
},
'DeathDomain': {
'year': 2012,
'releases': [2011, 2012],
'size': {
'nodes': None,
'edges': None
},
'authors': ['Myoungji University'],
'label': 'DeathDomain',
'color': '',
'taxons': ['human'],
'pubmeds': [22135292],
'urls': {
'articles': ['http://nar.oxfordjournals.org/content/40/D1/D331'],
'webpages': ['http://deathdomain.org/']
},
'license': {
'name':
'No license. Please cite the following paper when you use Death Domain database in your publications, which is very important to sustain our service: Kwon et al. 2012',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
},
'emails': [('[email protected]', 'Hyun Ho Park')],
'files': {
'articles': ['DeathDomain_Kwon2011.pdf'],
'data': {
'raw': ['deathdomain.tsv'],
'processed': ['deathdomain.sif']
}
},
'taxons': ['human'],
'size': {
'nodes': 99,
'edges': 175
},
'identifiers': ['GeneSymbol'],
'annot': ['experiment'],
'recommend':
'focused deep curation effort on death domain superfamily proteins; many specific relationships',
'descriptions': [
u'''
The PubMed database was used as the primary source for collecting information and constructing the DD database. After finding synonyms for each of the 99 DD superfamily proteins using UniProtKB and Entrez Gene, we obtained a list of articles using each name of the proteins and its synonyms on a PubMed search, and we selected the articles that contained evidence for physical binding among the proteins denoted. We also manually screened information that was in other databases, such as DIP, IntAct, MINT, STRING and Entrez Gene. All of the 295 articles used for database construction are listed on our database website.
'''
],
'notes': [
u'''
Detailful dataset with many references. Sadly the data can be extracted only by parsing HTML. It doesn't mean more difficulty than parsing XML formats, just these are not intended to use for this purpose.
'''
],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': True,
'data_integration': 'static',
'pypath': {
'data': ['pypath.data/dd_refs.csv'],
'format': [
'pypath.data_formats.pathway[\'death\']',
'pypath.data_formats.omnipath[\'death\']'
]
},
'pathguide': 442
},
'TRIP': {
'year': 2014,
'releases': [2010, 2012],
'urls': {
'articles': [
'http://www.plosone.org/article/info:doi/10.1371/journal.pone.0047165',
'http://nar.oxfordjournals.org/content/39/suppl_1/D356.full',
r'http://link.springer.com/article/10.1007%2Fs00424-013-1292-2'
],
'webpages': ['http://www.trpchannel.org'],
'omictools': [
'http://omictools.com/transient-receptor-potential-channel-interacting-protein-database-tool'
]
},
'label': 'TRIP',
'full_name':
'Mammalian Transient Receptor Potential Channel-Interacting Protein Database',
'emails': [('[email protected]', 'Ju-Hong Jeon')],
'size': {
'nodes': 468,
'edges': 744
},
'pubmeds': [20851834, 23071747, 23677537],
'files': {
'articles': ['TRIP_Shin2012.pdf'],
'data': {
'raw': [],
'processed': ['trip.sif']
}
},
'taxons': ['human', 'mouse', 'rat'],
'identifiers': ['GeneSymbol'],
'recommend':
'high number of specific interactions; focused on TRP channels',
'descriptions': [
u'''
The literature on TRP channel PPIs found in the PubMed database serve as the primary information source for constructing the TRIP Database. First, a list of synonyms for the term ‘TRP channels’ was constructed from UniprotKB, Entrez Gene, membrane protein databases (Supplementary Table S2) and published review papers for nomenclature. Second, using these synonyms, a list of articles was obtained through a PubMed search. Third, salient articles were collected through a survey of PubMed abstracts and subsequently by search of full-text papers. Finally, we selected articles that contain evidence for physical binding among the proteins denoted. To prevent omission of relevant papers, we manually screened information in other databases, such as DIP, IntAct, MINT, STRING, BioGRID, Entrez Gene, IUPHAR-DB and ISI Web of Knowledge (from Thomson Reuters). All 277 articles used for database construction are listed in our database website.
'''
],
'notes': [
u'''
Good manually curated dataset focusing on TRP channel proteins, with ~800 binary interactions. The provided formats are not well suitable for bioinformatics use because of the non standard protein names, with greek letters and only human understandable formulas. Using HTML processing from 5-6 different tables, with couple hundreds lines of code, one have a chance to compile a usable table.
'''
],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': True,
'pypath': {
'input': [
'pypath.dataio.trip_process()', 'pypath.dataio.take_a_trip()',
'pypath.dataio.trip_process_table()',
'pypath.dataio.trip_get_uniprot()',
'pypath.dataio.trip_find_uniprot()'
],
'intr': ['pypath.dataio.trip_interactions()'],
'data': [
'pypath.urls.urls[\'trip\'][\'intr\']',
'pypath.urls.urls[\'trip\'][\'show\']',
'pypath.urls.urls[\'trip\'][\'json\']',
],
'format': [
'pypath.data_formats.pathway[\'trip\']',
'pypath.data_formats.omnipath[\'trip\']'
]
},
'license': {
'name': 'CC-Attribution-ShareAlike-3.0',
'url': 'http://creativecommons.org/licenses/by-nc-sa/3.0/',
'commercial_use': True
},
'pathguide': 409
},
'Awan2007': {
'year': 2007,
'size': 0,
'authors': ['Wang Group'],
'label': 'Awan 2007',
'color': '',
'data_import': ['BioCarta', 'CA1'],
'contains': ['BioCarta', 'CA1'],
'urls': {
'articles':
['http://www.cancer-systemsbiology.org/Papers/iet-sb2007.pdf']
},
'emails': [('[email protected]', 'Edwin Wang')],
'pubmeds': [17907678],
'descriptions': [
u'''
To construct the human cellular signalling network, we manually curated signalling pathways from literature. The signalling data source for our pathways is the BioCarta database (http://www.biocarta.com/genes/allpathways.asp), which, so far, is the most comprehensive database for human cellular signalling pathways. Our curated pathway database recorded gene names and functions, cellular locations of each gene and relationships between genes such as activation, inhibition, translocation, enzyme digestion, gene transcription and translation, signal stimulation and so on. To ensure the accuracy and the consistency of the database, each referenced pathway was cross-checked by different researchers and finally all the documented pathways were checked by one researcher. In total, 164 signalling pathways were documented (supplementary Table 2). Furthermore, we merged the curated data with another literature-mined human cellular signalling network. As a result, the merged network contains nearly 1100 proteins (SupplementaryNetworkFile). To construct a signalling network, we considered relationships of proteins as links (activation or inactivation as directed links and physical interactions in protein complexes as neutral links) and proteins as nodes.
'''
],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': False,
'license': {
'name': 'No license',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
}
},
'Cui2007': {
'year': 2007,
'authors': ['Wang Group'],
'label': 'Cui 2007',
'color': '',
'data_import': ['Awan2007', 'CancerCellMap'],
'pubmeds': [18091723],
'contains': ['Awan2007', 'CancerCellMap', 'CA1', 'BioCarta'],
'urls': {
'articles': ['http://msb.embopress.org/content/3/1/152'],
'webpages': []
},
'emails': [('[email protected]', 'Edwin Wang')],
'files': {
'articles': ['Cui2007.pdf'],
'data': {
'raw': ['cui-network.xls', ],
'processed': ['cui.sif']
}
},
'identifiers': ['EntrezGene'],
'size': {
'edges': 4249,
'nodes': 1528
},
'taxons': ['human'],
'descriptions': [
u'''
To build up the human signaling network, we manually curated the signaling molecules (most of them are proteins) and the interactions between these molecules from the most comprehensive signaling pathway database, BioCarta (http://www.biocarta.com/). The pathways in the database are illustrated as diagrams. We manually recorded the names, functions, cellular locations, biochemical classifications and the regulatory (including activating and inhibitory) and interaction relations of the signaling molecules for each signaling pathway. To ensure the accuracy of the curation, all the data have been crosschecked four times by different researchers. After combining the curated information with another literature‐mined signaling network that contains ∼500 signaling molecules (Ma'ayan et al, 2005)[this is the CA1], we obtained a signaling network containing ∼1100 proteins (Awan et al, 2007). We further extended this network by extracting and adding the signaling molecules and their relations from the Cancer Cell Map (http://cancer.cellmap.org/cellmap/), a database that contains 10 manually curated signaling pathways for cancer. As a result, the network contains 1634 nodes and 5089 links that include 2403 activation links (positive links), 741 inhibitory links (negative links), 1915 physical links (neutral links) and 30 links whose types are unknown (Supplementary Table 9). To our knowledge, this network is the biggest cellular signaling network at present.
''', u'''
From Awan 2007: To construct the human cellular signalling network, we manually curated signalling pathways from literature. The signalling data source for our pathways is the BioCarta database (http://www.biocarta.com/genes/allpathways.asp), which, so far, is the most comprehensive database for human cellular signalling pathways. Our curated pathway database recorded gene names and functions, cellular locations of each gene and relationships between genes such as activation, inhibition, translocation, enzyme digestion, gene transcription and translation, signal stimulation and so on. To ensure the accuracy and the consistency of the database, each referenced pathway was cross-checked by different researchers and finally all the documented pathways were checked by one researcher.
'''
],
'notes': [
u'''
Excellent signaling network with good topology for all those who doesn't mind to use data of unknown origin. Supposedly a manually curated network, but data files doesn't include article references. Merging CA1 network with CancerCellMap and BioCarta (also without references) makes the origin of the data untraceable.
'''
],
'type': 'literature curated',
'subtype': 'activity flow',
'omnipath': False,
'license': {
'name': 'No license',
'url': 'http://www.gnu.org/licenses/license-list.html#NoLicense',
'commercial_use': False
}
},
'BioCarta': {
'year': 2006,
'releases': [2006],
'size': {
'nodes': None,
'edges': None
},
'authors': ['Community'],
'label': 'BioCarta',
'color': '',
'urls': {
'webpages': [