This tool can calculate the diversity (e.g. Shannon entropy and nucleotide diversity) at haplotype level within a specifed region (e.g. epitope) from a NGS sample. It accepts a alignment BAM file, and a GFF3 file as input and returns a tsv result. The application of this tool can be extended to any genomic regions, not only for immune epitopes. Written in Rust, presumably fast-running and memory-efficient.
epitope_diversity 0.1.1
Haogao Gu <[email protected]>
A tool for estimating haplotype diversity of specific regions (e.g. epitopes) from a NGS alignment.
USAGE:
epitope_diversity [OPTIONS] --bam_file <FILE> --pos_file <FILE>
OPTIONS:
-f, --bam_file <FILE> Path of BAM file. Must be accompanied with the BAI index file in the
same directory.
-p, --pos_file <FILE> Path to a GFF3 file specifying genomic positions of interest.
Start/End positions should be 1-based rather than 0-based, and should
correspond to the positions in the reference sequence used in SAM/BAM
alignment.
-o, --out_file <NUMBER> Path to write to the outfile, if "-" will write to stdout. [default:
-]
-v, --verbose Add this flag to also print text results to stderr.
-h, --help Print help information
-V, --version Print version information
Please check here. These example data are retrieved from test files from IRMA.
epitope_diversity -f ./examples/A_NP.bam -p "./examples/example.gff" -o -
seqid start end num_of_full_cover_reads num_of_haplotypes Shannon_entropy population_nucleotide_diversity
A_NP 1096 1112 28 10 1.924832680792314 0.024084633853541412
A_NP 100 120 16 8 2.216917186688699 0.022321428571428572
A_NP 10 1110 No haplotype No haplotypeNo haplotype
Directly download executables from Releases.
- Install Rust from here.
- Download source code by
git clone https://github.com/Koohoko/epitope_diversity.git
. - Install with
cargo install --path epitope_diversity
. - You are ready to go.
Description of the methods can be found in this blog post.
Specifically, we used the excat "Shannon entropy" and "Population nucleotide diversity" without normalization/correction in this paper, where
- v0.1.1 add columns in the output file for number of haplotypes, and number of total full-cover reads.
- None