Skip to content

Latest commit

 

History

History
80 lines (54 loc) · 1.7 KB

README.md

File metadata and controls

80 lines (54 loc) · 1.7 KB

Snakemake Microbiome Pipeline

Download pipeline

git clone https://github.com/vitmy0000/SnaMP.git

Create environment

module load python/anaconda2-4.2.0
conda create -c bioconda -m -p pyenvs/py35-snakemake python=3.5 pandas snakemake

Load environment

module load python/anaconda2-4.2.0
source activate pyenvs/py35-snakemake

Remove environment

source deactivate

Workflow

  • Closed reference OTU picking
  • BLAST agaist HOMD database

workflow_0

Usage

  1. Prepare sequencing data

Repalce the $SOURCE_FILES with the zipped sequencing result, e.g. WHI_Repo/RT530_Batch2/*.gz

cd input
ln -s $SOURCE_FILES .
cd ..
  1. Launch jobs

The pipeline will utilize CCR resource to parallel execution. OTU table and statisics about merge rate, filter rate, hit rate wiil be placed under table

snakemake -p -j 100 --cluster-config cluster.json --cluster "sbatch --partition {cluster.partition} --time {cluster.time} --nodes {cluster.nodes} --ntasks-per-node {cluster.ntasks-per-node}"
  1. Results

Three result files are placed under table directory.

  • QC_table.txt
  • raw_OTU_table_collapsed.txt
  • raw_OTU_table_uncollapsed.txt

Misc

To remove generated files:

snakemake clean

Check snakemake version:

snakemake -v
# 3.13.3

Known Issues

  • cluster time limit: if any of the jobs run of the time limit, it will be silently killed by CCR and the pipeline will keep waiting. To solve this, you need to terminate Snakemake, increase time limit specified in cluster.json accordingly and rerun the pipeline with extra option --rerun-incomplete.