Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

altered average_pairwise_rf_distance to avoid divide-by-zero in the c… #54

Merged
merged 4 commits into from
Dec 19, 2022

Conversation

marybarker
Copy link
Contributor

…ase of a tree shaped DAG, and added option to apply Sankoff only to a subset of DAG nodes

…ase of a tree shaped DAG, and added option to apply Sankoff only to a subset of DAG nodes
@marybarker marybarker requested a review from willdumm December 18, 2022 04:24
Copy link
Collaborator

@willdumm willdumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Sorry I didn't think of that divide-by-zero thing.

The sankoff stuff looks great! But I was thinking, what if we wanted to add an extra character to the already-computed sequence (for phylogeography) and only do sankoff on that character? For that it might be useful to be able to leave alone unambiguous bases in the existing node labels, and only modify those that are ambiguous (we could relabel by adding an N to the end of all internal sequences, then add the appropriate region character to the end of each leaf node sequence, then do sankoff). This would be like what use_internal_node_sequences does for ete trees.

Alternatively, it might be useful to just specify which label attribute we do Sankoff on, so by default it could be the sequence attribute, but it could also be the region attribute or something. I guess this would be a keyword argument passed to sankoff_downward.

I'm not suggesting that these things need to make it into this PR but I'm just thinking about how we can reuse this idea for something like adding an extra character for phytogeography...

@marybarker
Copy link
Contributor Author

marybarker commented Dec 18, 2022

Great idea! Yeah, it would be good to do Sankoff on a subset of the sites, leaving the remainder fixed. This will take a bit of extra thought(I've made an issue #55), since sites are not independent and hence it will require either a method to compute transition cost for fully resolved sequences or else a the full-length cost vector expansion, even when only a subset of the sites are being recomputed.

Mary Barker added 3 commits December 19, 2022 11:31
…ase of a tree shaped DAG, and added option to apply Sankoff only to a subset of DAG nodes
@marybarker marybarker marked this pull request as ready for review December 19, 2022 22:29
@marybarker marybarker merged commit 3aa3d22 into main Dec 19, 2022
@marybarker marybarker deleted the partial-sankoff branch December 19, 2022 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants