1KSA-Genome-Assembly-Pipeline-Nextflow

This repository contains a nextflow pipeline for denovo genome assembly of long ONT reads. This pipeline was built to support the genome assembly and analysis of the 1KSA project - A pilot project aimed at sequencing and assembling indigenous South African species. Detailed step by step instructions on how to run this pipeline on the CHPC can be found under files, labelled Instructions.txt.

Introduction

This workflow uses the following tools:

Dorado for basecalling
Samtools for converting bam files to fastq files
Nanoplot for quality check
Nanofilt for filtering and trimming
Flye for genome assembly
Racon for first round assembly polishing
Medaka for second round assembly polishing
BUSCO for assembly quality assessment
QUAST for assembly quality assessment
KMC for counting of k-mers in DNA

If basecalling was done on the sequencing machine, the main2.nf script can be used to run genome assembly once the fastq files have been concatinated.

Dependencies

The following modules need to be loaded on the CHPC before running the pipeline:

module purge
module load chpc/BIOMODULES
module load dorado
module load samtools/1.9
module load nanoplot
module load nanofilt
module load flye/2.9
module load minimap2
module load racon/1.5.0
module load medaka/1.11.3
module load quast/4.6.3
module load quast/4.6.3
module load busco/5.4.5
module load bbmap/38.95
module load metaeuk
module load python
module load R
module load KMC
module load nextflow/23.10.0-all

The following models and databases need to be downloaded before running the pipeline:

Dorado: dorado download --model [email protected]
Busco: busco --download eukaryota_odb10

Usage

To obtain the workflow, having installed nextflow, users can run:

nextflow run main.nf --help to see the options for the workflow.

Workflow outputs

The primary outputs of the pipeline include:

A fastq quality control report
3 assembled fasta files (From Flye, Racon and Medaka)
A busco report
A quast quality report

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
97d92b1b-f47a-4e23-adcd-27b8d0f559cb.jpg		97d92b1b-f47a-4e23-adcd-27b8d0f559cb.jpg
CopyRights.txt		CopyRights.txt
Instructions.txt		Instructions.txt
Kmer-Analysis.sh		Kmer-Analysis.sh
Main.nf		Main.nf
Main2.nf		Main2.nf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1KSA-Genome-Assembly-Pipeline-Nextflow

Introduction

Dependencies

Usage

Workflow outputs

About

Releases

Packages

Languages

setshabaTaukobong/Genome-Assembly-Pipeline-Nextflow

Folders and files

Latest commit

History

Repository files navigation

1KSA-Genome-Assembly-Pipeline-Nextflow

Introduction

Dependencies

Usage

Workflow outputs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages