The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
The content below is the unaltered changelog of the unreleased 2020 version of the pipeline.
Initial release of nf-core/kmermaid, created with the nf-core template.
- Add option to use Dayhoff encoding for sourmash.
- Add
bam2fasta
process to kmermaid pipeline and flags involved. - Add
extract_coding
andpeptide_bloom_filter
process and flags involved. - Add
track_abundance
feature to keep track of hashed kmer frequency. - Add social preview image.
- Add
fastp
process for trimming reads. - Add option to use compressed
.tgz
file containing output from 10X Genomics'cellranger count
outputs, includingpossorted_genome_bam.bam
andbarcodes.tsv
files. - Add samtools_fastq_unaligned and samtools_fastq_aligned process for converting bam to per cell barcode fastq.
- Add version printing for sencha, bam2fasta, and sourmash in Dockerfile, update versions in environment.yml
- For processes translate, sourmash compute add cpus=1 as they are only serial (#107).
- Add
sourmash sig merge
for aligned/unaligned signatures from bam files, and add--skip_sig_merge
option to turn it off. - Add
--protein_fastas
option for creating sketches of already-translated protein sequences. - Add
--skip_compare option
to skipsourmash_compare_sketches
process. - Add merging of aligned/unaligned parts of single-cell data (#117).
- Add renamed package dependency orpheum (used to be known as sencha).
- Increase CPUs in
high_memory_long
profile from 1 to 10.
- Rename splitkmer to
split_kmer
.
- Remove
one_signature_per_record
flag and add bam2fasta count_umis_percell and make_fastqs_percell instead of bam2fasta sharding method. - Use ripgrep instead of bam2fasta to make per-cell fastq, which will hopefully make resuming long-running pipelines on bams much faster.
- Make sure
samtools_fastq_aligned
outputs ALL aligned reads, regardless of mapping quality or primary alignment status.
- add
--skip_compute option
to skipsourmash_compute_sketch_*
. - Used
.combine()
instead ofeach
to do cartesian product of all possible molecules, ksizes, and sketch values. - Do
sourmash compute
on all input ksizes, and all peptide molecule types, at once to save disk reading/writing efforts.
- Updated sencha=1.0.3 to fix the bug in memory errors possibly with the numpy array on unique filenames (PR #96 on orpheum).
- Add option to write non-coding nucleotide sequences fasta files while doing sencha translate.
- Don't save translate csvs and jsons by default, add separate
--save_translate_json
and--save_translate_csv
. - Updated
sencha translate
default parameters to be--ksize 8 --jaccard-threshold 0.05
because those were the most successful. - Update renaming of
khtools
commands tosencha
.
- Fix the use of
skip_multiqc
flag condition with if and not when.
- Removed ability to specify multiple
--scaled
or--num-hashes
values to enable merging of signatures.