A modified version of Sarek (https://github.com/nf-core/sarek/tree/master) specifically for microorganisms. It uses multiple popular tools for SNP calling and annotation of microbial genomes.
- nf-core: https://nf-co.re/docs/nf-core-tools/installation
- nextflow: https://nf-co.re/docs/usage/getting_started/installation
conda create --name nf-core python=3.12 nf-core nextflow
conda activate nf-core
tar xvzf sarek.tar.gz
The folder "sarek" should be in the same directory with README.md, sbatch.sh, etc.
- Copy paired-end fastq files into the directory /data.
- Open the file "samplesheet.csv", and fill in the information of your samples. Each sample is in a line.
- Open the file "params.yaml", and change the parameters ('fasta' and 'fasta_fai') of the reference path and its index path.
- Get to the top directory of the pipeline, run
sbatch ./sbatch.sh
- Copy snp-calling vcf files into the directory /data.
- Add mapped bam and index files into the directory /data.
- Rename the file "samplesheet.csv.ann" to "samplesheet.csv". Then open it and fill in the information about your samples. Each sample is in a line.
- Open the file "params.yaml", and change the parameters ('fasta' and 'fasta_fai') of the reference path and its index path. And remove the comment mark in the line "#step: 'annotate'"
- get into the directory of the pipeline, run
sbatch ./sbatch.sh
Any microorganism can be analyzed.
Currently, only the reference genomes of Candida auris, Mycobacterium tuberculosis, and SARS-CoV-2 are installed in the pipeline. If you want to analyze other microorganisms, please install the corresponding reference genome yourself, or open a new issue in the GitHub. We may update the corresponding reference genome for you.