All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Reconciled workflow with wf-template v5.3.0.
- Updated Dorado to v0.8.1
- IGV configuration file with
--ref --igv
options and either--output_fmt bam
or--output_fmt cram
. - Support for gzipped reference genomes.
output_fmt
selects the output format for basecalled and aligned files.
- Updated Dorado to v0.8.0
- Reconciled workflow with wf-template v5.2.6.
- Do not emit the reference FASTA file.
- Collapse redundant RG and PG header lines when emitting BAM or CRAM.
- Workflow starting with
--duplex --barcode_kit
, despite duplex not supporting barcoding. - Workflow crashing with
--ref {{ reference }} --barcode_kit
. - Aligned reads will no longer be trimmed when demuxing to preserve mapping information.
- Workflow emits confusing warning about Bonito filtering when using Dorado.
fastq_only
andoutput_bam
options replaced byoutput_fmt
.--output_fmt fastq
can be used to output unaligned FASTQ instead of unaligned CRAM.--output_fmt bam
can be used to output unaligned or aligned BAM instead of CRAM.
- Modified base calling with
--duplex
. - APK 5.0.0 model.
- Updated Dorado to v0.7.2 (see https://github.com/nanoporetech/dorado/releases/tag/v0.7.2)
- Bug fix for downstream workflows and
--poly_a_config
which does not affect normal workflow use.
- Output channel for demuxed BAM files for downstream use.
### Added
- Support for
dorado demux
to demultiplex barcoded runs. Specify your--barcode_kit
to activate demultiplexing. - Support for poly(a) tail length estimation with
--poly_a_config
. You can configure by providng a TOML file to--poly_a_config
which is described in detail here
- Updated Dorado to v0.7.1 (see https://github.com/nanoporetech/dorado/releases/tag/v0.7.1)
- Report crashing when no data are present in the input pod5.
- Reconciled workflow with wf-template v5.1.3.
- Updated Dorado to v0.7.0 (see https://github.com/nanoporetech/dorado/releases/tag/v0.7.0)
- Added new DNA and RNA 5.0.0 models.
- Updated Dorado to v0.6.0 (see https://github.com/nanoporetech/dorado/releases/tag/v0.6.0)
- Workflow accepting incompatible
--fastq_only
and--duplex
options - Dynamically updated report in
--watch_path
mode.
- qscore_filter inadvertently disabled in v1.1.5
- Minor update to default resource requests on dorado task.
- Experimental feature switch.
- Updated Dorado to v0.5.2 (see https://github.com/nanoporetech/dorado/releases/tag/v0.5.2)
- Bumped memory directives for intense tasks to reduce likelihood of job failure
- Default to parallel GPU usage when using awsbatch profile
- Runtime driver check in Dorado process, as this is no longer available in the Dorado image
- Updated dorado version to v0.5.1 (see https://github.com/nanoporetech/dorado/releases/tag/v0.5.1)
- Reintroduced RNA002 models
--duplex
basecalling converts FAST5 to POD5 automatically- Converted POD5 files are deleted by default, use
--output_pod5
to output converted POD5 files to the workflow output directory.
- Converted POD5 files are deleted by default, use
- Updated Dorado to v0.3.4 (see https://github.com/nanoporetech/dorado/releases/tag/v0.3.4)
- Workflow crashes with fast5 input
- Workflow fails early when trying to use FAST5 input with Dorado duplex
- RNA004 models
- R941 v3.3 5mCG 5hmCG models
- Duplex calling with option
--duplex
- Note that duplex calling is not optimised for streaming basecalling with
--watch_path
and may lead to lower duplex yield. - Duplex basecalling is currently not compatible with modified basecalling.
- Note that duplex calling is not optimised for streaming basecalling with
- Updated Dorado to v0.3.2 (see https://github.com/nanoporetech/dorado/releases/tag/v0.3.2)
- Pascal architecture GPUs are now supported
- Bumped minimum required Nextflow version to 23.04.2
- Users no longer need to provide
--basecaller_cfg custom
and/or--remora_cfg custom
to override models with--basecaller_model_path
and/or--remora_model_path
respectively.
bamstats
process very slow whenoutput_bam
has been selected
- v4.2 5mC and 6mA modification models
- Updated Dorado to v0.3.1
- GPU tasks are limited to run in serial by default to avoid memory errors
- Users in cluster and cloud environments where GPU devices are scheduled must use
-profile discrete_gpus
to parallelise GPU work - A warning will be printed if the workflow detects it is running non-local execution but the discrete_gpus profile is not enabled
- Additional guidance on GPU support is provided in our Quickstart
- Users in cluster and cloud environments where GPU devices are scheduled must use
- Bumped minimum required Nextflow version to 22.10.8
- Command not found on
cram_cache
step - Typo in report that refers to the workflow as "wf-basecalling-report"
- Updated Dorado to v0.3.0
- BAM may be output instead of CRAM by providing
--output_bam
--help
message will list basecalling and modbasecalling models available for use with the workflow
- v4.2.0 models, which must be used for sequencing runs performed at new 5 kHz sampling rate
- v4.1.0 models replace v4.0.0 models and must be used for sequencing runs performed at 4 kHz sampling rate
- v4.0.0 models
- Custom models were previously rejected by the workflow as
basecaller_cfg
andremora_cfg
are validated against a list of basecalling models installed in the Dorado container.- Users should now provide
--basecaller_cfg custom
and/or--remora_cfg custom
to override models with--basecaller_model_path
and/or--remora_model_path
respectively. - Providing
--basecaller_cfg custom
or--remora_cfg custom
without the corresponding--basecaller_model_path
or--remora_model_path
will result in an error.
- Users should now provide
- Ability to watch the input path and process files as they become available in real time.
- Configuration for running demo data in AWS
- Missing models from list of valid models
- "[email protected]_5mCG@v0" is now correctly referred to as "[email protected]_5mCG@v0", to match the simplex model version
- "[email protected]_5mCG@v0" is now correctly referred to as "[email protected]_5mCG@v0", to match the simplex model version
- Updated Dorado to v0.2.4
- Updated to Oxford Nanopore Technologies PLC. Public License
- Dorado image correctly ships with CUDA runtime library
- Input ref channel depleted after first alignment
- Reference is no longer required for basecalling
- CRAM files with no alignments will be generated if
--ref
is not provided - FASTQ may be output instead of CRAM by providing
--fastq_only
- CRAM files with no alignments will be generated if
- PG line for converting Dorado SAM output to uBAM is no longer written to output header
- Work directory is automatically cleaned up on successful completion to remove large intermediate files
- Override this by including
cleanup = false
in a custom Nextflow configuration file
- Override this by including
- Number of threads for merging is now configurable for advanced users
- Updated Dorado to v0.2.1
--basecaller_cfg
and--remora_cfg
are now validated against a list of models installed in the Dorado container
- Workflow no longer prints a confusing error when Dorado fails
--basecaller_args
may be used to provide custom arguments to the basecalling process
- Updated Dorado to v0.1.1
- Latest models are now v4.0.0
- Workflow prints a more helpful error when Dorado fails due to unknown model name
- Updated description in manifest
- Default basecaller_basemod_threads value
- Undefined
colors
variable
- Workflow will now output pass and fail CRAM
- Reads are separated into pass and fail based on their mean qscore as calculated by dorado
- The threshold can be changed with
--qscore_filter
- Improved
--help
documentation
- Workflow will exit with "No files match pattern" if no suitable files are found to basecall
- Ensure to set
--dorado_ext
tofast5
orpod5
as appropriate
- Ensure to set
- Initial release of wf-basecalling supporting the Dorado basecaller
- Initialised wf-basecalling from wf-template #30ff92d