Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher vg call AD values after using vg filter #4443

Open
danielwood1992 opened this issue Nov 14, 2024 · 6 comments
Open

Higher vg call AD values after using vg filter #4443

danielwood1992 opened this issue Nov 14, 2024 · 6 comments

Comments

@danielwood1992
Copy link

1. What were you trying to do?

vg giraffe --output-format gam --gbz-name $vg.gbz --minimiser-name $autoindex.min --dist-name $autoindex.dist -f $R1 -f $R2 -t $threads $reference.fa $vcf.gz > $gam
vg filter $gam -r 0.9 -fu -m 1 -q 20 -D 999 -x $vg.gbz > $filter.gam
vg pack -t $threads -x $vg.xg -g $filter.gam -Q 5 -o $filter.gam.pack
vg call $vg.xg -t $threads -k $filter.gam.pack -v $vcf > $filter.gam.pack.vcf

vg pack -t $threads -x $vg.xg -g $gam -Q 5 -o $gam.pack
vg call $vg.xg -t $threads -k $gam.pack -v $vcf > $gam.pack.vcf

2. What did you want to happen?

Identify coverage of ref/alt alleles from the vcf call AD statistic - and looking at the impact of the vg filter/call step.

3. What actually happened?

Running vg filter on the gam reduced the size of the gam substantially. In most cases the AD for each site in $gam.pack.vcf vs. $filter.gam.pack.vcf made sense: the AD was higher for the calls from the unfiltered gam. However there a minority of sites (~2.5%) of cases where the AD for either the reference or alt allele is higher in the filtered gam than the same site in the unfiltered gam (which substantially alters the estimated ref/alt allele coverage - see example graph
AF_Graph
, blue points are where the ref or alt allele coverage is higher from the filtered vcf, red are where it is lower) - I don't understand how this can be the case, surely the filtering should only be removing reads? The graph is made from a biallelic vcf so it shouldn't be mutli-allele mapping that is the problem.

5. What data and command can the vg dev team use to make the problem happen?
I can send the data and exact script I used if needed. Any thoughts on this much appreciated!

6. What does running vg version say?

vg version v1.48.0 "Gallipoli"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601```

@danielwood1992 danielwood1992 changed the title Higher vf call AD values after using vg filter Higher vg call AD values after using vg filter Nov 14, 2024
@glennhickey
Copy link
Contributor

Yeah, that doesn't seem to make much sense. Are you able to share the gam and gbz, along with a site that's identical in the vcf but has higher AD in the filtered one? Thanks.

@danielwood1992
Copy link
Author

Yes I can - do you have an email address I can send a google drive link to? Thanks a lot!

@glennhickey
Copy link
Contributor

Sure, use my name here, glennhickey, at gmail.

@danielwood1992
Copy link
Author

great - shared, let me know if you have trouble accessing

@glennhickey
Copy link
Contributor

got it, thanks.

@glennhickey
Copy link
Contributor

Actually, I don't see the .xg in the shared folder, which is required for call -v.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants