GitHub - LaurenHuet/OceanOmics-OceanGenomes-ref-genomes: OceanGenomes reference genomes pipeline

Introduction

This pipeline is designed for the de novo genome assembly and analysis of high-quality marine vertebrate genomes as part of the Minderoo OceanOmics Ocean Genomes Project. It processes raw HiFi and Hi-C data, performs assembly, scaffolding, decontamination, generates key assembly statistics and prepares the genome for manual curation within pretext map.

Filter and convert bam files to fastq files (HiFiAdapterFilt)
PacBio Read QC (FastQC)
Count k-mers (Meryl)
Estimate genome size (GenomeScope2)
Assemble hifi data (Hifiasm)
Assembly stats on hifi data (Gfastats)
Illumina Read QC (FastQC)
Assemble Pacbio & Illumina reads (Hifiasm)
Assembly stats (Gfastats)
Gene assembly QC (BUSCO)
K-mer assembly QC (Merqury)
Create index (Samtools)
Index assemble and align Hi-C reads (BWA)
Map pairs (Pairtools)
Sort and index (Samtools)
Create scaffold (YAHS)
Create decontamination report (fcs-gx)
Create decontamination report (Tiara)
Filter decontaminated scaffolds (BBMap)
Scaffold stats (Gfastats)
Scaffold QC (BUSCO)
Scaffold QC (Merqury)
Generate coverage tracks (minimap2)
Generate coverage tracks (`bedtools)
Predict telomere locations (tidk)
Align reads to scaffolds (BWA)
Align reads to scaffolds (Pairtools)
Generate pretext maps (PretextMap)
Inject coverage tracks into pretext map (PretextGraph)
Present QC for raw reads (MultiQC)

Usage

Note

If you are new to Nextflow, please refer to this page on how to set-up Nextflow.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,hifi_dir,hic_dir,version,date,tolid,taxid
OG88,hifi_bams/OG89,hic_fastqs/OG89,hic1,v240101,OG88,163129
OG89,hifi_bams/OG89,,hifi1,v240202,OG88,163129
OG90,hifi_fastqs/OG90,hic_fastqs/OG90,hic1,v240303,OG88,163129

Each row represents a sample. The hifi_dir column must point to a directory that contains bam files or fastq files. The hic_dir column can point to a directory containing fastq files, however this column can be left blank if there isn't Hi-C data for this sample. Taxid refers to the NCBI taxon ID for that samples.

Now, you can run the pipeline using:

nextflow run Computational-Biology-OceanOmics/OceanGenomes-refgenomes \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --buscodb /path/to/buscodb \
   --gxdb /path/to/gxdb \
   --binddir /scratch \
   --tempdir <tempdir>
   -c pawsey_profile.config \
    -resume \
    -with-report

This repository contains a custom config file to run the pipeline on the pawsey supercomputer with slurm.

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs. For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

For details about the output files and reports, please refer to the output documentation.

Credits

Computational-Biology-OceanOmics/OceanOmics-OceanGenomes-ref-genomes was originally adapted from the Vertebrate Genome project Galaxy pipeline (https://galaxyproject.org/projects/vgp/) by Emma de Jong and was converted to Nextflow by Adam Bennett and Lauren Huet. This version was built on top of the nf-core template.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.cache/pre-commit		.cache/pre-commit
.config/nfcore/nf-core		.config/nfcore/nf-core
.devcontainer		.devcontainer
.github		.github
.npm		.npm
assets		assets
conf		conf
docs		docs
modules		modules
subworkflows		subworkflows
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
assembly_gfastats_compile.sh		assembly_gfastats_compile.sh
busco_compiled.sh		busco_compiled.sh
final_gfastats_compile.sh		final_gfastats_compile.sh
hifi-only-backup-loop.sh		hifi-only-backup-loop.sh
hifiadaptfilt-stats.py		hifiadaptfilt-stats.py
main.nf		main.nf
modules.json		modules.json
nextflow-run.sh		nextflow-run.sh
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
pawsey_profile.config		pawsey_profile.config
pyproject.toml		pyproject.toml
rclone-backup-hifi-only.sh		rclone-backup-hifi-only.sh
rclone-backup.Untitled-3.sh		rclone-backup.Untitled-3.sh
rclone_OG906.sh		rclone_OG906.sh
samplesheet.csv		samplesheet.csv
samplesheet2.csv		samplesheet2.csv
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Usage

Pipeline output

Credits

Citations

About

Releases 1

Packages

Languages

License

LaurenHuet/OceanOmics-OceanGenomes-ref-genomes

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

Pipeline output

Credits

Citations

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages