Skip to content

Commit

Permalink
Rename processes (#4)
Browse files Browse the repository at this point in the history
* Many changes, streamlining and renaming

* Remove unused scripts

* Update help, README

* Update README, help

* Remove unnecessary change

* Remove unused config, adjust alignment
  • Loading branch information
dfornika authored Aug 4, 2022
1 parent 7d9715e commit 39d8822
Show file tree
Hide file tree
Showing 25 changed files with 461 additions and 1,557 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ nextflow
results
*.sif
work
test_output
test_input
55 changes: 34 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,39 @@ A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (h

#### Introduction

------------
This pipeline is based on the [BCCDC-PHL/ncov2019-artic-nf](https://github.com/BCCDC-PHL/ncov2019-artic-nf) pipeline, which is a fork of the [connor-lab/ncov2019-artic-nf](https://github.com/connor-lab/ncov2019-artic-nf) pipeline. It has been modified to support analysis of monkeypox virus.


```mermaid
flowchart TD
ref[ref.fa]
composite_ref[composite_ref.fa]
primers[primer.bed]
fastq[fastq_dir]
fastq --> performHostFilter(performHostFilter)
composite_ref --> performHostFilter
performHostFilter(performHostFilter) --> normalizeDepth(normalizeDepth)
readTrimming(readTrimming) --> filterResidualAdapters(filterResidualAdapters)
normalizeDepth(normalizeDepth) --> readTrimming(readTrimming)
filterResidualAdapters --> readMapping(readMapping)
ref --> readMapping(readMapping)
readMapping(readMapping) --> trimPrimerSequences(trimPrimerSequences)
primers --> trimPrimerSequences(trimPrimerSequences)
trimPrimerSequences(trimPrimerSequences) --> callConsensusFreebayes(callConsensusFreebayes)
callConsensusFreebayes(callConsensusFreebayes) --> alignConsensusToReference(alignConsensusToReference)
ref --> alignConsensusToReference
trimPrimerSequences --> makeQCCSV(makeQCCSV)
callConsensusFreebayes --> makeQCCSV
callConsensusFreebayes --> consensus[consensus.fa]
callConsensusFreebayes --> variants[variants.vcf]
ref --> makeQCCSV
makeQCCSV --> qcCSV(qc.csv)
```

#### Quick-start
##### Illumina

```
nextflow run BCCDC-PHL/mpxv-artic-nf -profile conda \
--illumina --prefix "output_file_prefix" \
--prefix "output_file_prefix" \
--bed /path/to/primers.bed \
--ref /path/to/ref.fa \
--primer_pairs_tsv /path/to/primer_pairs_tsv \
Expand All @@ -33,33 +56,23 @@ The repo contains a environment.yml files which automatically build the correct

--cache /some/dir can be specified to have a fixed, shared location to store the conda build for use by multiple runs of the workflow.

#### Executors
By default, the pipeline just runs on the local machine. You can specify `-profile slurm` to use a SLURM cluster, or `-profile lsf` to use an LSF cluster. In either case you may need to also use one of the COG-UK institutional config profiles (phw or sanger), or provide queue names to use in your own config file.

#### Profiles
You can use multiple profiles at once, separating them with a comma. This is described in the Nextflow [documentation](https://www.nextflow.io/docs/latest/config.html#config-profiles)

#### Config
Common configuration options are set in `conf/base.config`. Workflow specific configuration options are set in `conf/illumina.config` They are described and set to sensible defaults (as suggested in the [nCoV-2019 novel coronavirus bioinformatics protocol](https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html "nCoV-2019 novel coronavirus bioinformatics protocol"))

Important config options are:

| Option | Default | Description |
|:---------------------------------|---------:|--------------------------------------------------------------------------------------------------------------------:|
| `allowNoprimer` | `true` | Allow reads that don't have primer sequence? Ligation prep = `false`, nextera = `true` |
| `illuminaKeepLen` | `50` | Length of illumina reads to keep after primer trimming |
| `illuminaQualThreshold` | `20` | Sliding window quality threshold for keeping reads after primer trimming (illumina) |
| `mpileupDepth` | `100000` | Mpileup depth for ivar |
| `varFreqThreshold` | `0.75` | ivar/freebayes frequency threshold for consensus variant |
| `varMinFreqThreshold | `0.25` | ivar/freebayes frequency threshold for ambiguous variant |
| `normalizationTargetDepth` | `200` | Target depth of coverage to normalize to prior to alignment |
| `normalizationMinDepth` | `5` | Minimum depth of coverage to normalize to prior to alignment |
| `keepLen` | `50` | Length of reads to keep after primer trimming |
| `qualThreshold` | `20` | Sliding window quality threshold for keeping reads after primer trimming |
| `varMinFreqThreshold` | `0.25` | Allele frequency threshold for ambiguous variant |
| `varFreqThreshold` | `0.75` | Allele frequency threshold for unambiguous variant |
| `varMinDepth` | `10` | Minimum coverage depth to call variant |
| `ivarMinVariantQuality` | `20` | ivear minimum mapping quality to call variant |
| `downsampleMappingQuality` | `20` | Exclude reads below this mapping quality while downsampling |
| `downsampleAmpliconSubdivisions` | `3` | Number of times amplicons are subdivided to determine locations of checkpoints to test for depth while downsampling |


#### QC
A script to do some basic QC is provided in `bin/qc.py`. This currently tests if >50% of reference bases are covered by >10 reads (Illumina) or >20 reads (Nanopore), OR if there is a stretch of more than 10 Kb of sequence without N - setting qc_pass in `<outdir>/<prefix>.qc.csv` to TRUE. `bin/qc.py` can be extended to incorporate any QC test, as long as the script outputs a csv file a "qc_pass" last column, with samples TRUE or FALSE.

#### Output
A subdirectory for each process in the workflow is created in `--outdir`. A `nml_upload` subdirectory containing files important for [CanCOGeN](https://www.genomecanada.ca/en/cancogen) is created.
A subdirectory for each process in the workflow is created in `--outdir`. A `nml_upload` subdirectory containing dehosted fastq files and consensus sequences is included.
Loading

0 comments on commit 39d8822

Please sign in to comment.