-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break tsinfer polytomies using mutational density #363
Comments
I can think of a simple model-based way to approach this, using EP and the existing variational approximation:
This wouldn't resolve polytomies into binary trees -- instead, it'd resolve polytomies into a set of multiple, smaller polytomies. In the pic you included, we'd probably end up with something like Do you have an algorithmic method for introducing polytomies, @hyanwong? If we have a small test problem (a mini-tree-sequence with a single non-trivial polytomy that involves 10 nodes or so), then I can try to implement the above and see if it'd work |
Really neat! I'm unclear how this would deal with edges that span a single parent node but where that node has different numbers of children in different regions of the genome. E.g. what happens if we have a node with 3 children at one end of the genome, 2 in the middle, and 3 again at the end? Re algorithmic methods for generating polytomies: no, but I guess it would be quite easy to write one. I'll have a ply. We do have a lot of inferred ones with polytomies, of course, but you are right that knowing the ground truth would be much better. |
I'm probably underestimating how complicated this is, but I was thinking of ignoring position entirely. For example, the node you describe could have sequence of children (A, B, C) -> (B, D) -> (A, C, D). The algorithm I describe would "cluster" A,B,C,D into two groups, and assign the latter group a new node as a parent. Call the new node E, and say B,D get assigned to it. Then the sequence of children for the original node would be (A, E, C) -> (E) -> (A, C, E). And the sequence of children for E (over the same spans) would be B -> (B, D) -> D. I think this always works out, because if we assume that E has a single parent (the original node) then the transmission paths are the same. |
I've just put a naive method for collapsing bifurcations into polytomies at tskit-dev/tskit#2885 I guess this could be used to test if we can recover extra topology from mutation density. |
If we had a single dated tree with a polytomy and mutations on the branches, we could probably use the relative rate of mutation along the branches to break the polytomy. E.g. in a polytomy like the following, it would probably be reasonable to split the polytomy by grouping together 2 and 3
. Or more formally. we could probably calculate the mutational likelihood of the 15 possible bifurcating topologies and pick the most likely.
Another more heuristic approach would be to calculate the average mutational rate between each pair of tips (i.e. the number of mutations between them divided by the branch-length distance), and then do something like UPGMA on the pairwise rates.
It's unclear to me how to extend this to a polytomy in a tree sequence, which could have different numbers of child edges coming and going. But generally, you could imagine the children of a polytomy "pulling" or "pushing" the parent node to differing strengths, measured by the difference between the parent & child posterior distributions and the edge mutational area separating them. It should be possible to work out the bets way to resolve this tension by creating new internal nodes that break the polytomy.
The text was updated successfully, but these errors were encountered: