Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reconsensus #95

Merged
merged 19 commits into from
Nov 4, 2024
Merged

Fix reconsensus #95

merged 19 commits into from
Nov 4, 2024

Conversation

mmolari
Copy link
Collaborator

@mmolari mmolari commented Oct 31, 2024

Fixed a bug in reconsensus: when transferring a substitution to the consensus, sites that are deleted should not receive the substitution.

This caused a failure in node removal downstream, in which empty nodes were not removed because they featured a substitution, which was erroneously located on a deleted site.

I added additional debug sanity checks and fixed the bug. I also added the edge-case to the test set.

@@ -41,12 +41,12 @@ impl PangraphNode {
}
}

pub fn len(&self) -> usize {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this this function had a bug. But due to possibly having circular genomes, the length of a node cannot always be determined by only start-end positions

self.position.1.saturating_sub(self.position.1)
}

pub fn is_empty(&self) -> bool {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore also this function was wrong. I thought about defining is_emtpy as checking if the start position coincides with the end position, but this is also wrong in one edge case: when the path is circular and composed of a single node, then the start/end position of the node can be the same even if the node is not empty.

@@ -87,8 +91,11 @@ fn reconsensus_mutations(block: &mut PangraphBlock) -> Result<(), Report> {
let subs_at_pos: Vec<_> = edit.subs.iter().filter(|s| s.pos == pos).cloned().collect();
match subs_at_pos.len() {
0 => {
edit.subs.push(Sub::new(pos, original));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed the bug: the substitution should be applied only if the position is not deleted.

@@ -203,14 +226,14 @@ mod tests {
// 01234567890123456789012
// node 1) .T...--..........A.....
// node 2) .T...--...C......|.....
// node 3) .T...--...C............
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the test so that it hits this edge-case

@mmolari mmolari marked this pull request as draft October 31, 2024 12:24
let node_idx = path_nodes.iter().position(|&n| n == node_id).unwrap();
path_nodes.remove(node_idx);
// while there is a node with this id, remove it:
while let Some(node_idx) = path_nodes.iter().position(|&n| n == node_id) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second fix: sometimes a path could have more than one empty node with the same id. The same id happens only if the nodes are adjacent. This removes all of them.

A more elegant solution might be to not produce these empty nodes at all (modify the merging procedure?) or to make sure they are generated with different id.

@mmolari mmolari marked this pull request as ready for review November 4, 2024 07:44
@ivan-aksamentov ivan-aksamentov merged commit 9888677 into rust Nov 4, 2024
15 checks passed
@ivan-aksamentov ivan-aksamentov deleted the fix-reconsensus branch November 4, 2024 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants