Developing a software package to analysis metagenomics data
Language: Python 2.7 Required package
- Biopython - 1.64
- NumPy - 1.8.2
- SciPy - 0.12.1
Required software
- Genovo v0.4: http://cs.stanford.edu/group/genovo/
- Xgenovo http://xgenovo.dna.bio.keio.ac.jp/
- Glimmer v3.02: https://ccb.jhu.edu/software/glimmer/
- requires ELPH v1.0.1: http://cbcb.umd.edu/software/ELPH/
- MINE v1.0.1: http://www.exploredata.net/
- requires Java: http://www.java.com/
- NCBI BLAST+ v2.2.28+: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
Note: Binary files distributed here might NOT work with your OS!
The following folder structure is used in the default control file. If Genovo/Glimmer/MINE are located at other destinations, please update the control file.
- data
- example ## contain basic examples
- Genovo
- Glimmer
- MINE
- BLAST # Contains
blastx
, which is a symlink to/usr/bin/blastx
- src
- core
- test
- The control file in MMAP have two purpose. The main purpose is allowing MMAP to locate other software (genovo, glimmer, MINE, and blast).
- The secondary purpose is to allow users to customize the parameters used in the pipeline. Please refer to the Additional parameters section.
- Genovo shipped with precompiled binaries. Check and run the commented demo script
DEMO.sh
. - If these precompiled binaries or the demo script fail. Recompile it from the
src
folder. The default Makefile requireslibtool
- Update control file accordingly. Make sure
genovo_pdir
points to the folder contains the following binariesassemble
andfinalize
.
- Compile Xgenovo.
- Update control file accordingly. Make user 'xgenovo_pdir' points to the folder contains the following binaries
assemble
andfinalize
.
- Glimmer often required some custom setup. Try to use full/absolute path if relative path doesn't work.
- Follow the instruction in
glim302notes.pdf
and compile glimmer from the source code. - Download and install elph
- Update
awkpath
,glimmerpath
, andelphbin
inglimmer/scritps/g3-iterated.csh
* The awkpath should be points to thescritps
folder * The glimmerpath should point to thebin
folder * The elphbin is the full path points to the executableelph
. Not the folder. - Update control file accordingly. Make sure
glimmer_pdir
points to the top level of the glimmer folder. MMAP is looking forbin
andscripts
underneath this folder.
The BLAST component queries annotated Gene Ontology sequences downloaded from http://geneontology.org. Prior to running an analysis, you must download the sequences and convert them to a BLAST+ formatted database.
A bash script is provided that does this automatically: [src/scripts/makeblastdb-go.sh]. It uses curl and makeblastdb to download a FASTA-formatted protein sequence and convert it to NCBI BLAST+ format.
It accepts one argument - the output directory for the GO Sequence Database.
$ ./makeblastdb-go.sh /data/go-seqdb
After generating the local database, add its path to the control file, so that the pipeline knows where to find it.
- Install and make sure your java version is at least 1.7 (
java -version
) - Update control file accordingly. Make sure
mine_pdir
points to the folder containsMINE.jar
.
- Update control file [default: MMAP/control]
- program_pdir points to the directory contains the exe
- blasd_db pointst to the local BLAST database
python MMAP/main.py -h
python MMAP/main.py summary -h
python MMAP/main.py process -h
## To run Genovo/Glimmer/Blast, use -i to provide input fasta file
python MMAP/main.py process -c data/example/control -i data/example/MMAP_example.fasta
python MMAP/main.py process -c data/example/controlX -i data/example/MMAP_paired_example.fasta
## To run MINE, use -m to provide a directory with list of csv files
python MMAP/main.py summary -m data/example/
## Custom control file can be used wit -c
python MMAP/main.py process -i data/example/MMAP_example.fasta -c path_to_custom_control_file