Scripts and containers need to be updated; general structure in place but incomplete/poorly commented.
Contact [email protected] for any queeries before final completion update
Repository contains custom Nextflow scripts and necessary Docker/Singularity containers for reproducibility. The scripts were coded to carry out the primary data analysis involved during my MSc project.
Whole microbiome shotgun data requires re-assembly to reconstruct the genomes of microbes within the sample. These two Nextflow pipelines were designed to assemble contigs (overlapping stretches of reads) from microogranisms in whole shotgun microbiome sequencing paired end FASTQ files. My MSC study related to biosynthetic gene cluster (BGC) detection in microbiome samples, as a result pipelines for microbiome assembly were necessary. The included Nextflow scripts carried this out alongside BGC detection in the assembled samples. I created two different pipelines, the primary differences being the assembly tools used; Megahit and MetaSpades.
-Pipeline 1: Megahit
Pipeline carries out FASTQ file QC, FASTQ contig assembly and BGC detection of contigs.
- Sample input; use direct ENA accession input
- Paired end FASTQ file QC (FASTQC & MultiQC)
- Paired end FASTQ file trimming (BBuk; BBTools suite)
- Trimmed paired end FASTQ file QC, monitor the trimming effect (FASTQC & MultiQC)
- Trimmed paired end FASTQ file contig assembly -> FASTA w/ contigs (Megahit)
- Contig QC (QUAST)
- AntiSMASH biosynthetic gene cluster (BGC) detection (AntiSMASH)
-Pipeline 2: Biosynthetic MetaSpades
Pipeline carries out FASTQ file QC, FASTQ contig and subsequent scaffold assembly. After assemblies BGC detection is run by AntiSMASH and Biosynthetic MetaSpades.
- Sample input; use Nextflow core pipeline 'NGS-fetch' to download FASTQ files for this pipeline
https://nf-co.re/fetchngs - Paired end FASTQ file QC (FASTQC & MultiQC)
- Paired end FASTQ file trimming (BBuk; BBTools suite)
- Trimmed paired end FASTQ file QC, monitor the trimming effect (FASTQC & MultiQC)
- Trimmed paired end FASTQ file contig assembly -> FASTA w/ contigs (MetaSpades)
- Contig QC (QUAST)
- Contig extension into longer overlapping scaffolds (non-contiguous sequences) (Biosynthetic MetaSpades)
- Scaffolds containing BGC detection (Biosynthetic MetaSpades)
- AntiSMASH BGC detection in contigs/scaffolds (AntiSMASH)
Nextflow pipeline centred around Megahit assembler
-Megahit Resources
https://github.com/voutcn/megahit
https://doi.org/10.1093/bioinformatics/btv033
Files:
- 'megahit.nf' - Nextflow script for the analysis using Megahit at the core of the pipeline
- 'biosynth.config' - Nextflow config file for the Megahit pipeline
- 'container' - relevant container files (containers have tools for easier repoducibility and necessary dependencies)
Nextflow pipeline centred around Biosynthetic MetaSpades assembler.
-Biosynthetic MetaSpades Resources
https://github.com/ablab/spades
https://dx.doi.org/10.1101%2Fgr.213959.116
Files:
- 'biosynth_metaspades.nf' - Nextflow script for the analysis using MetaSpades at the core of the pipeline
- 'megahit.config' - Nextflow config file for the MetaSpades pipeline
- 'container' - relevant container files (containers have tools for easier repoducibility and necessary dependencies)
Nextflow easily allows multiple ways of using tools. Three options described below for user ease;
If using a local system a Docker container can be made fresh from the provided Dockerfile and YAML files in 'container' directory
https://docs.docker.com/engine/reference/builder/
If using a HPC system, Singularity is preferred and can create a Singularity Image File (SIF) over a Docker container image, the provided SIF recipe and YAML files are provided in the 'container' directory
https://sylabs.io/guides/2.6/user-guide/container_recipes.html
https://hub.docker.com/repository/docker/gfarrell/allin
(currently requires additional tool updates, a folder will also include YAML & Dockerfile for creating custom Docker container)
Docker has major limitations on a shared HPC; therfore Singulairty is the better option and can convert a Docker image easily to a usable SIF. https://sylabs.io/guides/2.6/user-guide/singularity_and_docker.html
Directly specifcy in the script for Nextflow to set up the tool in an isolated Conda subsirectory in work folder; multiple runs neding all tools fresh -> remove cleanup after run step from config file.
https://www.nextflow.io/docs/latest/conda.html
Tools can be all be callable in a local Conda environment on the users system
https://docs.conda.io/en/latest/#
All three methods above can be mixed as needed; but user must adjust Nextflow scripts and configs as needed.
eg: FastQC tool on local install & MultiQC tool in a Docker Container & AntiSMASH called from Nextflow Conda function.
https://antismash.secondarymetabolites.org/#!/start
https://doi.org/10.1093/nar/gkab335