Use Pre Trained HMMs

Use Pre-Trained Hidden Markov Models

For the de novo reconstruction of the genome-scale metabolic models from KEGG, the RAVEN function getKEGGModelForOrganism can use KEGG Orthology specific HMMs for a homology search. Such a choice is particularly suitable if the target species is not listed in KEGG Species List. This option does not require KEGG FTP Subscription and is recommended for most users. Considering all RAVEN versions, the two different pipelines were used to generate the KEGG Orthology specific HMM sets:

The current pipeline: CD-HIT was used to obtain the non-redundant representative KEGG Orthology protein sets. This program does the protein clustering using the defined identity overlap threshold values with the longest protein in the corresponding cluster. Multi sequence alignment with MAFFT was then performed for such non-redundant protein sets. Finally, multi-sequence alignments were used as input to HMMER to train the KEGG Orthology-specific HMM sets. The HMM archives contain only pre-trained HMMs.
The classic pipeline: No longer used since RAVEN 1.9.1. Only relevant for KEGG Release 58.1. No protein clustering before the multi-sequence alignment was considered. The multi-sequence alignment was performed with ClustalW2, whereas HMMs were trained with HMMER. The HMM archives contain pre-trained HMMs and multi-sequence alignment data.

The HMM sets can be downloaded automatically during the model reconstruction from KEGG (set the dataDir parameter in getKEGGModelForOrganism). Alternatively, the download links are provided below. The following HMM sets are available:

KEGG Release	RAVEN Releases	Dataset/dataDir	Phylogeny	Software Used	CD-HIT Identity
105.0	RAVEN 2.8.0+	euk90_kegg105	Eukaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	90%
105.0	RAVEN 2.8.0+	prok90_kegg105	Prokaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	90%
102.0	RAVEN 2.7.4 - 2.7.9	euk90_kegg102	Eukaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	90%
102.0	RAVEN 2.7.4 - 2.7.9	prok90_kegg102	Prokaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	90%
100.0	RAVEN 2.6.0 - 2.7.3	euk90_kegg100	Eukaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	90%
100.0	RAVEN 2.6.0 - 2.7.3	prok90_kegg100	Prokaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	90%
94.0	RAVEN 2.4.0 - 2.5.3	euk100_kegg94	Eukaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	100%
94.0	RAVEN 2.4.0 - 2.5.3	euk90_kegg94	Eukaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	90%
94.0	RAVEN 2.4.0 - 2.5.3	euk50_kegg94	Eukaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	50%
94.0	RAVEN 2.4.0 - 2.5.3	prok100_kegg94	Prokaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	100%
94.0	RAVEN 2.4.0 - 2.5.3	prok90_kegg94	Prokaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	90%
94.0	RAVEN 2.4.0 - 2.5.3	prok50_kegg94	Prokaryote	cd-hit-v4.8.1 mafft-7.490 hmmer-3.3.2	50%

The model can be reconstructed by running the following command:

model=getKEGGModelForOrganism('abc','inputFasta.fa','euk90_kegg104','outputDirectory',true,true,true,true,10^-50,0.8,0.3,-1,inf,1);

NOTE: Unlike in the model reconstruction based on KEGG Organism three-four letter code, the first input parameter is only used in model.id and does not influence the homology search in any way, so any string can be used here.

Introduction
Installation
- Solvers
- Uninstall
External Databases
Getting Started
- Tutorials
- RAVEN Model Structure
Model Reconstruction from KEGG
- Option 1: Based on KEGG Organism Code
- Option 2: Based on Homology Search Against KEGG Orthology Specific HMMs
  - Option 2-a: Use Pre-Trained HMMs
  - Option 2-b: de novo Generate HMMs
Development Policy
Known Issues
Developer Protocols
- KEGG Update in RAVEN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Pre Trained HMMs

Use Pre-Trained Hidden Markov Models

Clone this wiki locally