-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for leaf labels with ambiguities #38
Comments
Let However, in such a tree This |
In historydag this will be provided by a subclass of We'll need a version of hamming distance which returns the minimum distance possible between two nodes. Not sure if we'd ever want it, but we could also have a subclass that includes ambiguous sequences at every node, which would allow e.g. min-weight-ambiguous labeled trees to be used to create a DAG... I guess we need to think about how these things are related In Larch, we need a way of keeping track of leaf IDs, so we can correctly replace the resolved sequences chosen for leaves by matOptimize with the original ambiguous sequences before merging back to the DAG. We also want to provide ambiguous sequences (in the internal representation for vcf data of usher) to matOptimize when optimizing a tree. |
It seems likely that we can accommodate ambiguous bases in leaf labels using the mutation annotated dag/compact genome DAG structure, and do so without losing the parsimony criterion, as long as the internal node labels are fully unambiguous.
If the internal node labels (in compact genome format) are fully disambiguated, then for any tree sampled from the DAG, its parsimony score can be computed explicitly from the internal nodes, and minimized over the resolutions of ambiguous leaf labels.
We need to explicitly prove the ansatz that resolutions of leaf labels can be performed independently of each other, and here is a place to start that conversation.
The text was updated successfully, but these errors were encountered: