Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in multi-allele matching? #977

Open
jeromekelleher opened this issue Nov 8, 2024 · 3 comments
Open

Bug in multi-allele matching? #977

jeromekelleher opened this issue Nov 8, 2024 · 3 comments

Comments

@jeromekelleher
Copy link
Member

I've spent the day tracking down a bug in sc2ts, which is pointing towards being a problem with tsinfer's multi-allele handling (and allele index rotating to deal with non-zero ancestral state). Based on my experimental HMM branch #959 the version of sc2ts at e5fd80369b2e0813c2e46b1810980bb640054ed6 was loading data into a tsinfer SampleData instance, running some matches and getting the results out by overriding the tsinfer.SampleMatcher class.

Specifically, what seems to be happening is that when matching strain ERR4207042 against the 2020-03-11 ARG tsinfer is missing that this sample has a T at this site not a G (ancestral state). The haplotype we send in looks OK, but we match to a node that has the ancestral state without reporting a mutation in the output. It's not clear to me how it happened, other than I'm guessing there's some problem with the ancestral state allele rotation code.

I say this because sc2ts now uses the low-level components directly, bypassing the SampleData class entirely and is correctly making these matches (which is how I spotted the problem). So I don't think it's a problem with the actual HMM, more some issue with the high-level wrapping code.

I don't have time to create a reproducible example I'm afraid.

I don't know whether we need to do anything with this pre release - I guess we don't support matching on multiallelic sites anyway, and we're pretty sure the biallelic case is working fine, right?

@hyanwong
Copy link
Member

hyanwong commented Nov 8, 2024

Hmm, that's a bit concerning isn't it. But you are right about the biallelic state being hopefully robust. I guess we are happy that balletic with missing-data (i.e. states -1, 0, and 1) is still OK, and that we are robust to 1 being the ancestral allele and 0 being the derived?

@jeromekelleher
Copy link
Member Author

Yes, I hope so. It does make me wonder if we should push forward the simple changes for allowing non-zero ancestral state and just jettison that complexity

@hyanwong
Copy link
Member

hyanwong commented Nov 8, 2024

If it's not too much work to get non-zero AS working in the matching, then I guess so? This is presumably orthogonal to using the state polarity for ancestor building in tsinfer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants