speedup Hi-C analysis with bwa mem2 #118
agalitsyna
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Mapping can be faster with bwa mem2, which claims to be x2-3 times faster than traditional bwa mem. However, different aligners usually produce different results both quantitatively and qualitatively. E.g. tags can mean different things. Pairtools parse and phase relies heavily on tags definitions, though. There is a reason to be skeptical about using pairtools on top of bwa mem2.
I started a small test of
pairtools parse
with reporting the flags that are important forpairtools phase
(or will be important in the nearest future; AS, NM, XA). I produced some small sample from Drosophila data from Erceg et al. 2019 and processed with both traditional bwa mem and bwa mem2:bwa-mem2 mem -SP -t 5 ./genomes/bwa-mem2/dm6-057-439-JJ-snps-embryo \ HiC_057_439_embryo_rep1.lane1.1.fastq HiC_057_439_embryo_rep1.lane1.2.fastq > test.bwa-mem2.sam
and parsed it with
pairtools parse --drop-sam --drop-seq --walks-policy all --add-columns XA,NM,AS,mapq --min-mapq 0 \ test.bwa-mem2.sam -c ./genomes/dm6_057_439-JJ-snps-embryo.reduced.chromsizes > test.bwa-mem2.txt
Sample size is around 82 mln pairs after extraction of all pairs.
The results are curious, as there are around 0.006% of pairs affected. Highest difference is between XA fields that report the suboptimal alignments (yet its around few thousands of pairs per ~80 mln reads):
This suggests that there are minor differences between models of bwa-mem/bwa-mem2 and, most importantly, the fields mean the same and can be used with pairtools. Thus, those in desperate search of speedup can switch to bwa mem2 (in distiller as well).
P.S.: I am not sure why there are sometimes readID mismatches, maybe because the alignments are reported differently in some rare cases.
Beta Was this translation helpful? Give feedback.
All reactions