Releases: nanoporetech/remora
Releases · nanoporetech/remora
v3.3.0
Dataset rework to add filters, more robust metadata and ability to train basecaller from Remora dataset. Also includes updates to modified base models.
v3.2.0
Feature additions:
- Addition of modified base models for v5.0 basecalling models
- All models (except for inosine) provided with SUP and HAC support
- DNA
- 5mC+5hmC, 5mC+4mC, 6mA
- RNA (all models now all-contexts)
- m6A, Pseudouridine, Inosine
- Add
remora model inspect
command - Add
remora dataset copy
command - Allow multiple models to be used with
remora infer from_pod5_and_bam
(support C-mod and A-mod calling simultaneously) - Update learning rate scheduler to use cosine decay
- Support IUPAC revcomp
- Picoamp scaling enabled (model maintain use of scaling via k-mer levels)
- Allow signal padding for chunks at the end of a read
- Add parameter to limit the size of a dataset merge
- Allow training and validation from core or config dataset
- Reference-anchored inference documentation
Bug fixes:
- Extend access to
missing_metrics_ok
to allow more robust use of API for read loading - Pin plotnine version
- Fix bug for chunk extraction over entire read
- Fix bug in m6A model specification
v3.1.0
v3.0.0
This version adds several new features as well as general bug fixes and optimizations.
Key Improvements:
- A major Remora datasets update
- Easier dataset composition and manipulation
- Flexible dataset mixing, allowing use of randomers, native, enzymatic, PCR, spike-in, and other dataset types
- Datasets defined by configuration file, which can be generated automatically
- Larger datasets enabled
- Model training has now been demonstrated on over one billion training chunks
- Easier hyper-parameter tuning at training time
- Easier dataset composition and manipulation
- Enhanced signal and metrics plotting and exploration interface
- Improved model inference speed
- Full RNA support, including an m6A model - also available for production modified base calling through Dorado
- ChEBI code support
- Allow any modified base with full pipeline support
- Split reads support
- Use latest POD5 update
- Allow single POD5 or directory of POD5 files as input
- Various bug fixes
v2.1.3
Fix for cython 3.0 update
v2.1.2
v2.1.1
- Add all-context models to repository
- Add finetune option to model training
v2.1.0
This version adds several new features as well as general bug fixes and optimizations.
Major features:
- Raw signal plotting allows users to visualize raw Nanopore signal aligned to a reference
- In addition, the Remora API supports efficient and easy-to-use access to pre-read per-site signal based statistics (such as notebooks in repository).
- Allows users to explore modified bases in signal space to gain intuition for modified base model training.
- This replaces the Tombo signal visualization and metrics extraction features.
- Infer performance optimization and bug fix on certain systems
- Note that Dorado is still preferred as the production modified basecalling platform, but these improvements allow users to perform modified basecalling after canonical basecalling more efficiently.
- Initial RNA Support
- Support 3' to 5' signal in Remora datasets and models
- Various training improvements
- Batch balancing
- Dynamic training data filtering
v2.0.0
Remora v2.0.0 release
Feature additions:
- Updated kit14 5mC+5hmC models
- Simplified POD5+BAM input pipeline
- Remove ONNX model format (pytorch only unified with Dorado)
- Automatic model downloads
- Inference and validation from modBAM format
- Duplex modified base calling
- Remore Taiyaki/Megalodon dependency
- Basecall-anchored training
v1.1.1
Remora v1.1.1 release
Feature additions:
- Guppy-compatible model export including version 1 Remora models
Bug Fixes
- onnxruntime protobuf dependency version issue
remora validate from_modbams
using strand from--regions-bed
- Fix big in unused chunk extraction code