- Small viral reference set and read simulator for future testing
- Modified build code to prevent insertion of minimizers with ambiguous bases
- Expose --load-factor setting to kraken2-build
- New --minimum-hit-groups option to kraken2
- Require 2 hit groups (set of overlapping k-mers w/ same minimizer) to make classification by default
- Allow build options to pass through to subsequent invocations (e.g., k-mer length for 16S DBs)
- Removed env options for library downloads (no longer available from same NCBI location)
- Updated SILVA to release 138
- Removed mention of --fastq-input from Manual
- Made PE read identifier suffix trimming more restrictive (only on /1 and /2)
- Bug where some reads would be classified with taxid 0
- Bug that didn't allow kraken2-inspect to work with large databases
- FTP downloading option for taxonomy/libraries (--use-ftp for kraken2-build)
- Option to skip downloading taxonomy maps
- Added lookup table to speed up parsing in MinimizerScanner class
- Default parameters for minimizer lengths and spaces changed (spaces=7 for nucleotide search, length=12 for translated search)
- Linked space expansion value for proteins to constant used by MinimizerScanner
- Reporting of taxids in classified-out sequence files
- Confidence scoring bug associated with failure to leave some sequences unclassified
- Reverse complement shifting bug, code made backwards-compatible with existing databases (newly created DBs will have fix)
- NCBI taxonomy download error due to removal of EST/GSS files
- Disk usage info to kraken2-build --clean
- Memory allocation error message for hash table
- Option for --max-db-size and hash downsampling
- Multithreading to kraken2-inspect
- Move to /usr/bin/env for perl scripts
- Add DB loading message to keep people from killing processes early
- Add flag files for resuming download of nucleotide accession map data
- Converted lookup_accession_numbers script into C++ program w/ memory mapping
- Clarified in manual that one or more libraries allowed for custom DBs
- Silenced progress messages in C++ programs for non-TTY stderr
- Taxonomy downloads switched to rsync from wget (ftp)
- Removed '%' from reports
- Allow d/l of protozoa library w/ kraken2-build script
- Filenames for SILVA database taxonomy info
- Typo in manual for output format example
- Corrected default space count in manual
- Removed obvious race condition in --add-to-library functionality
- Corrected behavior of --classified-out and --unclassified-out (no longer forcing .fq/.fa file extensions, respecting '#' in paired mode)
- Usage message in kraken2-inspect
- Taxonomy creation for 16S databases
- New DB summary info printed out w/ inspect script + --skip-counts option
- Now stripping carriage returns and other trailing whitespace from sequence data
- Treating l-mers immediately following ambiguous characters as ambiguous until a full k-mer is processed
- Bug in expansion of spaced seed masks that left spaces at end
- New kraken2-inspect script to report minimizer counts per taxon
- Kraken 2X build now adds terminators to all reference sequences
- Improved portability to older g++ by removing initialization of variable-length string.
- Reporting options to kraken2 script (like Kraken 1's kraken-report and kraken-mpa-report)
- Made loading to RAM default option, added --memory-mapping option to kraken2
- Low base quality masking option
- Moved low-complexity masking to library download/addition, out of build process
- Made no masking default for human genome in standard installation
- Low-complexity sequence masking as a default
- UniVec/UniVec_Core databases to supported downloads
- UniVec_Core & human in standard Kraken 2 DB
- 16S DB support (Greengenes, Silva, RDP)
- --use-names flag for kraken2 script
- Priority queue to ensure classifier output order matches input order when multi-threading
- Changelog
- Reduced amino acid alphabet (requires rebuild of old protein DBs)
- Operating manual
- kraken2 now allows compression & paired processing at same time