Compares open-source metagenomic classification tool performance (precision, sensitivity, runtime) across various sequencing platforms (Illumina MiSeq/iSeq, Oxford Nanopore MinION) and use cases (metagenomic profiles).
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
The META system has been designed to run on Linux (specifically, tested on Ubuntu 18.04) and in Docker containers. The following packages are required:
Here is an example of how to install these on Ubuntu 18.04:
# Install Docker engine (reference: https://docs.docker.com/engine/install/ubuntu/)
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo docker run hello-world # to verify successful install
To build the META Simulator, run the following from the root directory of meta_simulator
:
docker build -t meta_simulator:latest .
To integrate the META simulator with the Docker-based Meta System, you will need to export the meta_simulator into a docker tarfile and save it in the meta_system/data/docker
directory.
- To export the meta_simulator, run the following command:
docker save -o meta_simulator.tar meta_simulator:latest
- Move
meta_simulator.tar
tometa_system/data/docker
- To make sure it loads on
meta_system
runmake load-docker
onmeta_system
The META Simulator requires an abundance profile TSV. An abundance profile is expressed as a tab-delimited text file (TSV) where the first column contains the leaf taxonomic ID, the second column contains the corresponding abundance proportion (must sum to 1.000000), and the third column designates the organism as being foreground (1
) or background (0
). There should be no headers in the abundance profile TSV. An example is shown below:
400667 0.10 1
435590 0.10 1
367928 0.10 1
864803 0.10 1
1091045 0.10 1
349101 0.10 1
1282 0.10 1
260799 0.10 1
1529886 0.10 1
198094 0.10 1
An example TSV is included within the Docker container in data/strawman_envassay.tsv
.
The META Simulator accepts the following arguments:
-t
number of threads to use for simulations-i
list of taxid with associated abundance (totalling 1.0)-p
sequencing platform to simulate reads for (case sensitive)- The options are:
iseq
Illumina iSeq 100miseq
Illumina MiSeq (assuming both illumina platforms have spot count of 8M, and taking 1/100 of this) [80,000]r9
Oxford Nanopore R9 flowcell (MIN106) - best performance at 50Gbp output (will assume 20Gbp and 20kb avg read length = 1M reads) [10,000]flg
Oxford Nanopore Flongle flowcell (FLG001) - best performance at 2Gbp output (1/25 of r9) (assuming 10% of r9 output) [1,000]
- The options are:
-o
Output directory (combined fastq file for classification will be at$outdir/simulated.fastq
)
To run DeepSimulator (Nanopore R9 flowcell) using META Simulator, run:
docker run meta_simulator:latest bash scripts/sim_module_wrapper.sh -t 2 -i data/strawman_envassay.tsv -p r9 -o data/test
To run DeepSimulator (Nanopore Flongle flowcell) using META Simulator, run:
docker run meta_simulator:latest bash scripts/sim_module_wrapper.sh -t 2 -i data/strawman_envassay.tsv -p flg -o data/test
To run InsilicoSeq (Illumina MiSeq) using META Simulator, run:
docker run meta_simulator:latest bash scripts/sim_module_wrapper.sh -t 2 -i data/strawman_envassay.tsv -p miseq -o data/test
To run InsilicoSeq (Illumina iSeq) using META Simulator, run:
docker run meta_simulator:latest bash scripts/sim_module_wrapper.sh -t 2 -i data/strawman_envassay.tsv -p iseq -o data/test
If you wish to run the simulator with your own abundance profile, use the Docker bind mount -v
flag for docker run
to mount the volume containing your abundance profile TSV.
This project is licensed under Apache 2.0. Copyright under Johns Hopkins University Applied Physics Laboratory.