nailpolish
is a collection of tools made for the deduplication of UMIs when working with long read single cell data.
nailpolish
is distributed as a single binary with no dependencies (beyond libc).
Up-to-date builds are available through the
Releases
section for macOS (Intel & Apple Silicon) and x64-based Linux systems.
nailpolish
is in active development. If you are running into any issues, please check to ensure that you are using
the most current version of the software!
Say I have a demultiplexed sample.fastq
file of the following formβfor instance, one generated using
the Flexiplex demultiplexer:
@BC1_UMI1
sequence...
+
quality...
I first create an index file using
$ nailpolish index --file sample.fastq --output index.tsv
I can view summary statistics about duplicate rates using:
$ nailpolish summary --index index.tsv
and I can also transparently remove duplicate reads using:
$ nailpolish call \
--index index.tsv \
--input sample.fastq \
--output sample_called.fastq \
--threads 4
which will output all non-duplicated and consensus called reads, removing all the original duplicated reads in the process.
π
nailpolish version 0.1.0
ββββββββββββββββββββββββββββββββββ
tools for consensus calling barcode and UMI duplicates
https://github.com/DavidsonGroup/nailpolish
Usage: nailpolish generate-index [OPTIONS] --file <FILE>
nailpolish summary --index <INDEX>
nailpolish call [OPTIONS] --index <INDEX> --input <INPUT>
nailpolish group [OPTIONS] --index <INDEX> --input <INPUT> [COMMAND]...
nailpolish help [COMMAND]...
Options:
-h, --help Print help
-V, --version Print version
nailpolish generate-index:
Create an index file from a demultiplexed .fastq, if one doesn't already exist
--file <FILE> the input .fastq file
--index <INDEX> the output index file [default: index.tsv]
-h, --help Print help
nailpolish summary:
Generate a summary of duplicate statistics from an index file
--index <INDEX> the index file
-h, --help Print help
nailpolish call:
Generate a consensus-called 'cleaned up' file
--index <INDEX> the index file
--input <INPUT> the input .fastq
--output <OUTPUT> the output .fasta; note that quality values are not preserved
-t, --threads <THREADS> the number of threads to use [default: 4]
-d, --duplicates-only only show the duplicated reads, not the single ones
-r, --report-original-reads for each duplicate group of reads, report the original reads along with the consensus
-h, --help Print help
nailpolish group:
'Group' duplicate reads, and pass to downstream applications
--index <INDEX> the index file
--input <INPUT> the input .fastq
--output <OUTPUT> the output location, or default to stdout
--shell <SHELL> the shell used to run the given command [default: bash]
-t, --threads <THREADS> the number of threads to use. this will not guard against race conditions in any downstream applications used. this will effectively set the number of individual processes to launch [default: 1]
-h, --help Print help
[COMMAND]... the command to run. any groups will be passed as .fastq standard input [default: cat]
nailpolish help:
Print this message or the help of the given subcommand(s)
[COMMAND]... Print help for the subcommand(s)
Example of --duplicates-only
and --report-original-reads
Suppose I have a demultiplexed read file of the following format (so that seq2
and seq3
are duplicates):
@BCUMI_1 seq1 @BCUMI_2 seq2 @BCUMI_2 seq3Then, the effects of the following flags are:
(default): >BCUMI_1_SIN seq1 >BCUMI_2_CON_2 seq2_and_3_consensus
--duplicates-only: >BCUMI_2_CON_2 seq2_and_3_consensus
--report-original-reads >BCUMI_1_SIN seq1 >BCUMI_2_DUP_1_of_2 seq2 >BCUMI_2_DUP_2_of_2 seq3 >BCUMI_2_CON_2 seq2_and_3_consensus
The recommended way to download Nailpolish is to use the automated builds, which can be found in the Releases section for macOS (Intel + Apple Silicon) and x64 Linux systems.
You will need a modern version of Rust installed on your machine, as well as the Cargo package manager. That's it - all
package installations will be done automatically at the build stage.
This will install nailpolish
into your local PATH
.
$ cargo install --git https://github.com/DavidsonGroup/nailpolish.git
# or, from a local directory
$ cargo install --path .
You will need a reasonably modern version of gcc
and cmake
installed, and the CARGO_NET_GIT_FETCH_WITH_CLI
flag
enabled. For instance:
$ module load gcc/latest cmake/latest
$ CARGO_NET_GIT_FETCH_WITH_CLI="true" cargo install --git https://github.com/DavidsonGroup/nailpolish.git
$ git clone https://github.com/DavidsonGroup/nailpolish.git
$ cargo build --release
The binary can be found at /target/release/nailpolish
.