Skip to content

Commit

Permalink
Updates to readme and config
Browse files Browse the repository at this point in the history
  • Loading branch information
BioWilko committed Sep 26, 2024
1 parent 960e885 commit d429ade
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 88 deletions.
76 changes: 17 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,46 @@
# artic-mpxv-illumina-nf
A Nextflow pipeline for processing amplicon data generated by Illumina sequencing, with a focus on monkeypox virus (mpxv).

![push master](https://github.com/BCCDC-PHL/mpxv-artic-nf/actions/workflows/push_master.yml/badge.svg)

#### Introduction

This pipeline is based on the [BCCDC-PHL/ncov2019-artic-nf](https://github.com/BCCDC-PHL/ncov2019-artic-nf) pipeline, which is a fork of the [connor-lab/ncov2019-artic-nf](https://github.com/connor-lab/ncov2019-artic-nf) pipeline. It has been modified to support analysis of monkeypox virus.
This pipeline is based on the [BCCDC-PHL/ncov2019-artic-nf](https://github.com/BCCDC-PHL/ncov2019-artic-nf) pipeline, which is a fork of the [connor-lab/ncov2019-artic-nf](https://github.com/connor-lab/ncov2019-artic-nf) pipeline. It has been modified to support analysis of MPox virus.

```mermaid
flowchart TD
ref[ref.fa]
composite_ref[composite_ref.fa]
ref[reference.fasta]
primers[primer.bed]
primer_pairs[primer_pairs.tsv]
fastq[fastq_dir]
fastq --> normalizeDepth(normalizeDepth)
composite_ref --> performHostFilter
normalizeDepth(normalizeDepth) --> performHostFilter(performHostFilter)
performHostFilter(performHostFilter) --> readTrimming(readTrimming)
readTrimming(readTrimming) --> filterResidualAdapters(filterResidualAdapters)
filterResidualAdapters --> readMapping(readMapping)
fastq[directory]
human-t2t-hla[humanT2Treference]
fastq --> performHostFilter
humanT2Treference --> performHostFilter
performHostFilter(performHostFilter) --> readMapping(readMapping)
ref --> readMapping(readMapping)
readMapping(readMapping) --> trimPrimerSequences(trimPrimerSequences)
primers --> trimPrimerSequences(trimPrimerSequences)
trimPrimerSequences(trimPrimerSequences) --> callConsensusFreebayes(callConsensusFreebayes)
readMapping(readMapping) --> align_trim(align_trim)
primers --> align_trim(align_trim)
align_trim(align_trim) --> callConsensusFreebayes(callConsensusFreebayes)
callConsensusFreebayes(callConsensusFreebayes) --> alignConsensusToReference(alignConsensusToReference)
ref --> alignConsensusToReference
alignConsensusToReference --> consensusAlignment[consensus.aln.fa]
trimPrimerSequences --> makeQCCSV(makeQCCSV)
callConsensusFreebayes --> makeQCCSV
callConsensusFreebayes --> consensus[consensus.fa]
callConsensusFreebayes --> variants[variants.vcf]
callConsensusFreebayes --> Squirrel(Squirrel)
ref --> makeQCCSV
primers --> makeQCCSV
primer_pairs --> makeQCCSV
makeQCCSV --> qcCSV(qc.csv)
makeQCCSV --> depthPNG(depth.png)
Squirrel --> SquirrelReport(squirrel-report.html)
```

#### Quick-start

```
nextflow run BCCDC-PHL/mpxv-artic-nf -profile conda \
--prefix "output_file_prefix" \
--bed /path/to/primers.bed \
--ref /path/to/ref.fa \
--primer_pairs_tsv /path/to/primer_pairs_tsv \
--composite_ref /path/to/human_and_mpxv_composite_ref \
--directory /path/to/reads \
--outdir /path/to/outputs
nextflow run artic-network/artic-mpxv-illumina-nf --help
```

Will print up-to-date information on all command-line parameters for the current version.

# Credits / Acknowledgements
This pipeline only works due to the ongoing efforts of many people performing the often thankless
job of developing and maintaining bioinformatics software, including but not limited to:
Expand All @@ -70,44 +60,12 @@ Special thanks to the following for writing / modifying / maintaining previous v
* Jared Simpson, `https://github.com/jts/ncov2019-artic-nf`
* Dan Fornika et al, `https://github.com/BCCDC-PHL/mpxv-artic-nf`

#### Installation
An up-to-date version of Nextflow is required because the pipeline is written in DSL2. Following the instructions at https://www.nextflow.io/ to download and install Nextflow should get you a recent-enough version.


#### Conda
The repo contains a environment.yml files which automatically build the correct conda env if `-profile conda` is specifed in the command. Although you'll need `conda` installed, this is probably the easiest way to run this pipeline.

--cache /some/dir can be specified to have a fixed, shared location to store the conda build for use by multiple runs of the workflow.

#### Config

Important config options are:

| Option | Default | Description |
|:---------------------------------|---------:|--------------------------------------------------------------------------------------------------------------------:|
| `normalizationTargetDepth` | `200` | Target depth of coverage to normalize to prior to alignment |
| `normalizationMinDepth` | `5` | Minimum depth of coverage to normalize to prior to alignment |
| `keepLen` | `50` | Length of reads to keep after primer trimming |
| `qualThreshold` | `20` | Sliding window quality threshold for keeping reads after primer trimming |
| `varMinFreqThreshold` | `0.25` | Allele frequency threshold for ambiguous variant |
| `varFreqThreshold` | `0.75` | Allele frequency threshold for unambiguous variant |
| `varMinDepth` | `10` | Minimum coverage depth to call variant |

### Depth Normalization
By default, sequence depth will be normalized using `bbnorm` to the value specified by the `--normalizationTargetDepth` param (default: 200). To skip depth normalization, add the `--skip_normalize_depth` flag.

#### QC
A script to do some basic QC is provided in `bin/qc.py`. It measures the % of reference bases are covered by `varMinDepth`, and the longest stretch of consensus sequence with no `N` bases. This script does not make a QC pass/fail call.

#### Output
A subdirectory for each process in the workflow is created in `--outdir`. A `nml_upload` subdirectory containing dehosted fastq files and consensus sequences is included.

### Problems and Solutions

1. Error during `mpxvIllumina:prepareReferenceFiles:get_bed_ref` step
1. Error during `mpxvIllumina:prepareReferenceFiles:performHostFilter` step
```
httpx.HTTPError: Failed to download https://objectstorage.uk-london-1.oraclecloud.com/n/lrbvkel2wjot/b/human-genome-bucket/o/human-t2t-hla.tar. Ensure you are connected to the internet, or provide a valid path to a local index
```

Currently human read removal is performed with hostile, which downloads an indexed human genome file on the fly. This is an internet problem.
Currently human read removal is performed with hostile, which downloads an indexed human genome file on the fly. This is almost certainly a network connectivity problem.

25 changes: 0 additions & 25 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,9 @@

nextflow.enable.dsl = 2

// include modules
include {printHelp} from './modules/help.nf'

// import subworkflows
include {mpxvIllumina} from './workflows/illuminaMpxv.nf'

if (params.help){
printHelp()
exit 0
}

if ( !params.directory ) {
println("Please supply a directory containing fastqs or CRAMs with --directory.")
println("Use --help to print help")
System.exit(1)
}

if ( ! params.prefix ) {
println("Please supply a prefix for your output files with --prefix")
println("Use --help to print help")
System.exit(1)
} else {
if ( params.prefix =~ /\// ){
println("The --prefix that you supplied contains a \"/\", please replace it with another character")
System.exit(1)
}
}

// entrypoint workflow
WorkflowMain.initialise(workflow, params, log)

Expand Down
17 changes: 13 additions & 4 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ manifest {
description = 'Epi2me compatible Nextflow pipeline for processing ARTIC tiling amplicon Illumina sequencing reads from monkeypox virus (MPXV) samples.'
mainScript = 'main.nf'
nextflowVersion = '>=20.01.0'
version = '1.2.0'
version = '1.2.1'
}

epi2melabs {
tags = 'MPox,artic,amplicon,viruses,public health,illumina'
tags = 'mpox,artic,amplicon,viruses,public health,illumina'
icon = 'faVirusCovid'
}

Expand All @@ -31,6 +31,15 @@ def makeFastqSearchPath ( illuminaSuffixes, fastq_exts ) {

params {

wf {
example_cmd = [
"--directory 'some_directory_containing_fastqs'",
"--scheme_version 'artic-mpox/v1.1.1-cladeI'",
"--clade 'cladei'",

]
}

illuminaSuffixes = ['*_R{1,2}_001', '*_R{1,2}', '*_{1,2}' ]
fastq_exts = ['.fastq.gz', '.fq.gz', '.fastq', '.fq']
fastqSearchPath = makeFastqSearchPath( params.illuminaSuffixes, params.fastq_exts )
Expand All @@ -40,8 +49,8 @@ params {
max_time = '12.h'

// Boilerplate options
// directory = false
// prefix = false
directory = false
prefix = false
// primer_pairs_tsv = 'NO_FILE'
// profile = false
help = false
Expand Down

0 comments on commit d429ade

Please sign in to comment.