-
Notifications
You must be signed in to change notification settings - Fork 53
Use Pre Trained HMMs
For the de novo reconstruction of the genome-scale metabolic models from KEGG, the RAVEN function getKEGGModelForOrganism can use KEGG Orthology specific HMMs for a homology search. Such a choice is particularly suitable if the target species is not listed in KEGG Species List. This option does not require KEGG FTP Subscription and is recommended for most users. Considering all RAVEN versions, the two different pipelines were used to generate the KEGG Orthology specific HMM sets:
- The current pipeline: CD-HIT was used to obtain the non-redundant representative KEGG Orthology protein sets. This program does the protein clustering using the defined identity overlap threshold values with the longest protein in the corresponding cluster. Multi sequence alignment with MAFFT was then performed for such non-redundant protein sets. Finally, multi-sequence alignments were used as input to HMMER to train the KEGG Orthology-specific HMM sets. The HMM archives contain only pre-trained HMMs.
- The classic pipeline: No longer used since RAVEN 1.9.1. Only relevant for KEGG Release 58.1. No protein clustering before the multi-sequence alignment was considered. The multi-sequence alignment was performed with ClustalW2, whereas HMMs were trained with HMMER. The HMM archives contain pre-trained HMMs and multi-sequence alignment data.
The HMM sets can be downloaded automatically during the model reconstruction from KEGG (set the dataDir parameter in getKEGGModelForOrganism). Alternatively, the download links are provided below. The following HMM sets are available:
KEGG Release | RAVEN Releases | Dataset/dataDir | Phylogeny | Software Used | CD-HIT Identity |
---|---|---|---|---|---|
105.0 | RAVEN 2.8.0+ | euk90_kegg105 | Eukaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
90% |
105.0 | RAVEN 2.8.0+ | prok90_kegg105 | Prokaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
90% |
102.0 | RAVEN 2.7.4 - 2.7.9 | euk90_kegg102 | Eukaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
90% |
102.0 | RAVEN 2.7.4 - 2.7.9 | prok90_kegg102 | Prokaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
90% |
100.0 | RAVEN 2.6.0 - 2.7.3 | euk90_kegg100 | Eukaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
90% |
100.0 | RAVEN 2.6.0 - 2.7.3 | prok90_kegg100 | Prokaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
90% |
94.0 | RAVEN 2.4.0 - 2.5.3 | euk100_kegg94 | Eukaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
100% |
94.0 | RAVEN 2.4.0 - 2.5.3 | euk90_kegg94 | Eukaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
90% |
94.0 | RAVEN 2.4.0 - 2.5.3 | euk50_kegg94 | Eukaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
50% |
94.0 | RAVEN 2.4.0 - 2.5.3 | prok100_kegg94 | Prokaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
100% |
94.0 | RAVEN 2.4.0 - 2.5.3 | prok90_kegg94 | Prokaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
90% |
94.0 | RAVEN 2.4.0 - 2.5.3 | prok50_kegg94 | Prokaryote | cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2 |
50% |
The model can be reconstructed by running the following command:
model=getKEGGModelForOrganism('abc','inputFasta.fa','euk90_kegg104','outputDirectory',true,true,true,true,10^-50,0.8,0.3,-1,inf,1);
NOTE: Unlike in the model reconstruction based on KEGG Organism three-four letter code, the first input parameter is only used in model.id
and does not influence the homology search in any way, so any string can be used here.
- Introduction
- Installation
- External Databases
- Getting Started
- Model Reconstruction from KEGG
- Option 1: Based on KEGG Organism Code
- Option 2: Based on Homology Search Against KEGG Orthology Specific HMMs
- Option 2-a: Use Pre-Trained HMMs
- Option 2-b: de novo Generate HMMs
- Development Policy
- Known Issues
- Developer Protocols