Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different BAM alignments when using an Ubuntu 20.04 base image versus a 22.04 base image #4391

Open
dpuleri opened this issue Sep 6, 2024 · 0 comments

Comments

@dpuleri
Copy link

dpuleri commented Sep 6, 2024

Different BAM alignments are produced when using the baseline docker container (quay.io/vgteam/vg:v1.59.0) which is based off of Ubuntu 20.04 versus an Ubuntu 22.04 based container I created myself.

To make the 22.04 container I used the Dockerfile present in the repository at tag v1.59.0 and modified the first line to FROM mirror.gcr.io/library/ubuntu:22.04 AS base.

The following command was used to create the container:

git checkout v1.59.0
git submodule update --init --recursive
make version
docker build --no-cache -f Dockerfile --build-arg THREADS=16 --tag my_container:v1.59.0 ./

I then ran a small 1 read FASTQ derived from HG002-NA24385. The contents of the FASTQ are below:

@HISEQ1:22:H9UJNADXX:1:2208:19830:37201 1:N:0:GCCAAT
CCACTCCACTCAAATCCATTCCATTAAACTCCATTCCATTCCATTCCACTCCTCTCCGTTTCATTCCACTCCACTGCATTCTTTTCCACTCCATTCCTCTCCACTCCATTCCCCTCCATTCCATTCCTTTCCACTCCACTCCACTCCA
+
CCCFFFFFHHHHHHHIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIHJJJJJJJJJJJJJJJJJJIJJJIJHHHHHHFFFFFFFFEEEEEDDDDDDDFFEDDDDDDDEEEEEEEEDDDDDEDDDDDDDDDDDDDDD

Running vg:

docker run -i $CONTAINER vg giraffe --progress --read-group 'sample_rg1' --sample "HG002" --prune-low-cplx --max-fragment-length 3000 --output-format bam -f ./1_read.fastq -H /Ref/hprc-v1.0-mc-grch38-minaf.0.1.gbwt -g /Ref/hprc-v1.0-mc-grch38-minaf.0.1.gg -d /Ref/hprc-v1.0-mc-grch38-minaf.0.1.dist -m /Ref/hprc-v1.0-mc-grch38-minaf.0.1.min -t 1 > $OUT.bam

I get the following differences between BAMs:

$ diff <(samtools view out_2004base.bam) <(samtools view out_2204base.bam)   
1c1
< HISEQ1:22:H9UJNADXX:1:2208:19830:37201	16	GRCh38.chr22_KI270736v1_random	162495	60	45S41M1I61M	*	TGGAGTGGAGTGGAGTGGAAAGGAATGGAATGGAGGGGAATGGAGTGGAGAGGAATGGAGTGGAAAAGAATGCAGTGGAGTGGAATGAAACGGAGAGGAGTGGAATGGAATGGAATGGAGTTTAATGGAATGGATTTGAGTGGAGTGG	DDDDDDDDDDDDDDDEDDDDDEEEEEEEEDDDDDDDEFFDDDDDDDEEEEEFFFFFFFFHHHHHHJIJJJIJJJJJJJJJJJJJJJJJJHIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIHHHHHHHFFFFFCCC	AS:i:31	RG:Z:sample_rg1
---
> HISEQ1:22:H9UJNADXX:1:2208:19830:37201	16	GRCh38.chr22_KI270736v1_random	162440	60	40M10D46M1I61M	*	TGGAGTGGAGTGGAGTGGAAAGGAATGGAATGGAGGGGAATGGAGTGGAGAGGAATGGAGTGGAAAAGAATGCAGTGGAGTGGAATGAAACGGAGAGGAGTGGAATGGAATGGAATGGAGTTTAATGGAATGGATTTGAGTGGAGTGG	DDDDDDDDDDDDDDDEDDDDDEEEEEEEEDDDDDDDEFFDDDDDDDEEEEEFFFFFFFFHHHHHHJIJJJIJJJJJJJJJJJJJJJJJJHIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIHHHHHHHFFFFFCCC	AS:i:31	RG:Z:sample_rg1

Giving different mapping positions and CIGAR strings.

I traced it down to the Multipath Alignment stage in surjection. Turning on the debug macros in src/multipath_alignment_graph.cpp shows differences in the path starts.

1. What were you trying to do?
Align from FASTQ to BAM file.

2. What did you want to happen?
Same result across platforms/toolchains.

3. What actually happened?
Differences in aligned BAMs, see above.

5. What data and command can the vg dev team use to make the problem happen?
See above.

6. What does running vg version say?

vg version v1.59.0 "Casatico"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by root@buildkitsandbox
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant