Skip to content

Latest commit

 

History

History
64 lines (55 loc) · 3.42 KB

README.md

File metadata and controls

64 lines (55 loc) · 3.42 KB

StrainIQ - Strain Identification and Quantification

StrainIQ is an n-gram based method to identify and quantify microbial communities in metagenomics samples.

Dependencies:

  • Python >= 3.6.2
  • BioPython 1.72
  • Pandas 0.23.4

Usage

StrainIQ.py [-h] -p Program [-n n-size]
                   [-glist file with a list of reference genome]
                   [-ng total number of genomes in the dasem] [-dsem DSEM]
                   [-sample sample] [-prediction -prediction]

optional arguments:
  -h, --help            show this help message and exit
  -p Program            Program to run. Example: builder, identifier,
                        quantifier
  -n n-size             Size of the n-gram
  -glist file with a list of reference genome
                        Tab delimited file with list of genomes to be included
                        in the DSEM. Format: gid filelocation
  -ng total number of genomes in the dasem
                        The number of genomes in the DSEM. This parameteris
                        needed if the DSEM is rebuilt using different sets for
                        references.
  -dsem DSEM            The DSEM for this bodysite.
  -sample sample        Input metagenomic sample. Convert the reverse reads in
                        case of paired end sequencing before bombining the
                        reads together
  -prediction -prediction
                        The prediction file produced by the identifier

StrainIQ has three main parts: Builder, Identifier, and Quantifier.

StrainIQ Builder: StrainIQ-Builder generates a DSEM for a body site. This is run only once for each body site at the front end of the process and repeated as and when the genomes need to be updated in a DSEM. It takes n-size and the list of genomes in a body site as input. The –glist is a tab-delimited file with user-assigned genome-id and genome file location. Along with the output DSEM, the builder also creates a configuration file for use with identification and quantification steps. These files are available for download for GI Tract, Blood and Urogenital tract.

DSEM Building

python StrainIQ.py 
	    -p builder # Use builder program to build DSEM
	    –n 21 # Default n-gram size for GI.
	    –glist <genome list> # genome file and location.

StrainIQ Identifier: StrainIQ-Identifier takes –dsem and sample name as input to identify the microbes in the given sample. The identifier refers to the configuration file for the additional parameters. The configuration file is generated as a part of the DSEM building and has the same name as DSEM with the .conf extension. I addition to DSEM and configuration file, the identifier also refers to the Map file and Taxonomy file.

python StrainIQ.py 
    -p identifier # Use identifier program for identification
    -dsem gi.dsem # Choose appropriate DSEM for the body site
    -sample sample1.fastq #Provide sample in fastq format

StrainIQ Quantifier: StrainIQ-Quantifier takes –dsem, -sample and –prediction file as input to calculate the abundance of microbes in the metagenomic sample.

python StrainIQ.py 
    -p quantifier #Use quantifier program for quantification
    -dsem gi.dsem #Choose appropriate DSEM for the body site
    -sample sample1.fastq # sample in fastq format
    -prediction sample1.prediction # list identified genomes