Here we compare the speed of covtobed with:
- bedtools genomecov - a widely used tool that will produce the same BED output as covtobed, exept for a different sorting of empty chromosomes
- mosdepth - a powerful tool for genome coverage analysis that will also produce a BED file, although default parameters will be filtering out probably spurious alignments
covtobed is significantly faster than bedtools genomecov. covtobed is faster than mosdepth on small genomes, and on large genomes (like the Human genome) with a limited fraction of the target covered (e. g. target enrichment panels). With panels it can be up to 60X faster than mosdepth. With large genomes highly covered (e. g. exomes, whole genome sequencing) is slightly slower than mosdepth (3-4X slower).
sambamba also performs coverage statistics but will not print a BED file. A test has been done to compare the speed that appear slower, possibly also because of the bigger output printed. See: comparison with sambamba
The benchmark has been performed using hyperfine, installable via conda.
To download the datasets use the get_datasets.sh
script.
The script will download two target enrichment panels (BAM).
If invoked with a parameter, will also download an exome from the 1000 Genomes Project.
A benchmark_stream.sh
script will compare covtobed to bedtools, redirecting the output to /dev/null
,
while benchmark_disk.sh
will also compare mosdepth.
samtools is only included in the streaming section, but should be noted that produces a non-BED output, and the coverage will not be counted in deletions, that is not the intended behaviour in covtobed (but this explains the much bigger computation time).
covtobed is constantly faster than bedtools.
covtobed is up to 20 times faster than mosdepth on medium datasets (e. g. Human gene panels), while on larger datasets (e. g. Human whole exomes) it's up to 5 times slower.
This is the test done saving to file. Note that mosdepth will save the file compressed and indexed, thus requiring more time, and it's the only program tested supporting multithreading (only for BAM decompression).
See also example2.bam benchmark.
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
mosdepth -x m_ example1.bam |
68.344 ± 1.750 | 66.352 | 69.629 | 65.23 ± 2.22 |
mosdepth -x -t 4 m2_ ex1.bam |
65.891 ± 0.736 | 65.123 | 66.590 | 62.89 ± 1.57 |
covtobed example1.bam > ex1.bed |
1.048 ± 0.023 | 1.021 | 1.063 | 1.00 |
bedtools genomecov -bga -ibam ex1.bam > ex1.bed |
32.478 ± 0.798 | 31.830 | 33.370 | 31.00 ± 1.03 |
- chr1
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
mosdepth -x m_chr1 chr1.bam |
35.681 ± 0.650 | 35.154 | 36.974 | 1.41 ± 0.03 |
mosdepth -x -t 4 m_chr1 chr1.bam |
25.347 ± 0.362 | 25.044 | 25.940 | 1.00 |
covtobed chr1.bam > chr1.bed |
128.576 ± 3.142 | 122.704 | 131.607 | 5.07 ± 0.14 |
bedtools genomecov -bga -ibam chr1.bam > chr1.bed |
223.619 ± 6.603 | 217.182 | 235.391 | 8.82 ± 0.29 |
- chr21
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
mosdepth -x m_chr21 chr21.bam |
6.511 ± 0.156 | 6.384 | 6.809 | 1.29 ± 0.04 |
mosdepth -x -t 4 m_chr21 chr21.bam |
5.064 ± 0.108 | 4.965 | 5.248 | 1.00 |
covtobed chr21.bam > chr21.bed |
18.764 ± 0.633 | 18.115 | 19.941 | 3.71 ± 0.15 |
bedtools genomecov -bga -ibam chr21.bam > chr21.bed |
62.386 ± 1.472 | 60.422 | 63.909 | 12.32 ± 0.39 |
- See also all chromosomes results
The human genome alignment file (124,379,080 alignments) was downloaded from:
🔗 ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/GBR/HG00114/alignment/HG00114.alt_bwamem_GRCh38DH.20150718.GBR.low_coverage.cram
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
mosdepth -F 4 -x mosx1 genome.bam |
212.4 ± 1.2 | 210.59 | 213.6 | 1.29 ± 0.01 |
mosdepth -F 4 -x -t 4 mosx4 genome.bam |
164.6 ± 1.1 | 162.9 | 165.8 | 1.00 |
covtobed genome.bam > covtobed.bed |
555.9 ± 41.6 | 521.3 | 618.7 | 3.38 ± 0.25 |
bedtools genomecov -bga -ibam genome.bam > bedtools.bed |
690.1 ± 39.3 | 666.0 | 759.2 | 4.19 ± 0.24 |