-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cycle validation #111
add cycle validation #111
Conversation
Just based on looking at the code, this appears correct. But I didn't actually run any tests. One question: does |
Yes
I also did some tests with the following query graphs, where queryGraph1 and queryGraph2 should be considered cycles and queryGraph3 and queryGraph4 should not be considered cycles. Query Graphsconst queryGraph1 =
{
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:222284"],
"categories":["biolink:ChemicalEntity"]
},
"n1": {
"ids":[
"MONDO:0005267",
"MONDO:0005542",
"MONDO:0005311",
"MONDO:0005542",
"MONDO:0004995"
],
"categories":["biolink:DiseaseOrPhenotypicFeature"]
},
"n2": {
"ids":["MONDO:0100096"],
"categories":["biolink:DiseaseOrPhenotypicFeature"]
},
"n3": {
"categories":["biolink:Gene"]
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:related_to"]
},
"e02": {
"subject": "n1",
"object": "n2",
"predicates": ["biolink:related_to"]
},
"e03": {
"subject": "n0",
"object": "n3",
"predicates": ["biolink:related_to"]
},
"e04": {
"subject": "n3",
"object": "n2",
"predicates": ["biolink:related_to"]
}
}
};
const queryGraph2 =
{
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:222284"],
"categories":["biolink:ChemicalEntity"]
},
"n1": {
"ids":[
"MONDO:0005267",
"MONDO:0005542",
"MONDO:0005311",
"MONDO:0005542",
"MONDO:0004995"
],
"categories":["biolink:DiseaseOrPhenotypicFeature"]
},
"n2": {
"ids":["MONDO:0100096"],
"categories":["biolink:DiseaseOrPhenotypicFeature"]
},
"n3": {
"categories":["biolink:Gene"]
}
},
"edges": {
"e02": {
"subject": "n1",
"object": "n2",
"predicates": ["biolink:related_to"]
},
"e03": {
"subject": "n2",
"object": "n3",
"predicates": ["biolink:related_to"]
},
"e04": {
"subject": "n3",
"object": "n1",
"predicates": ["biolink:related_to"]
}
}
};
const queryGraph3 =
{
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:222284"],
"categories":["biolink:ChemicalEntity"]
},
"n1": {
"ids":[
"MONDO:0005267",
"MONDO:0005542",
"MONDO:0005311",
"MONDO:0005542",
"MONDO:0004995"
],
"categories":["biolink:DiseaseOrPhenotypicFeature"]
},
"n2": {
"ids":["MONDO:0100096"],
"categories":["biolink:DiseaseOrPhenotypicFeature"]
},
"n3": {
"categories":["biolink:Gene"]
}
},
"edges": {
"e02": {
"subject": "n1",
"object": "n2",
"predicates": ["biolink:related_to"]
},
"e03": {
"subject": "n1",
"object": "n3",
"predicates": ["biolink:related_to"]
}
}
};
const queryGraph4 =
{
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:222284"],
"categories":["biolink:ChemicalEntity"]
},
"n1": {
"ids":[
"MONDO:0005267",
"MONDO:0005542",
"MONDO:0005311",
"MONDO:0005542",
"MONDO:0004995"
],
"categories":["biolink:DiseaseOrPhenotypicFeature"]
},
"n2": {
"ids":["MONDO:0100096"],
"categories":["biolink:DiseaseOrPhenotypicFeature"]
},
"n3": {
"categories":["biolink:Gene"]
}
},
"edges": {
"e02": {
"subject": "n1",
"object": "n2",
"predicates": ["biolink:related_to"]
},
"e03": {
"subject": "n1",
"object": "n3",
"predicates": ["biolink:related_to"]
},
"e05": {
"subject": "n2",
"object": "n0"
}
}
}; |
This looks great! If you could turn those test QGraphs into a couple of tests it would help with testing coverage/ongoing verification. |
Tests have been added! |
Tagging @andrewsu @colleenXu for final approval to merge -- everything seems ready and tests pass (on local -- github tests still broken) |
Looks good to me! |
This is not recognized as a cycle by this code; should it be? Or another case that we want to address (stopping execution)? Disease ID <-> DiseaseThis and any larger QGraph that has something like this continues execution (isn't ID'd as a cycle and stopped)
TDLR (for this post and the one below it): it looks pretty good otherwise (I didn't fully execute anything, I only started querying and stopped execution if it looked like it passed the checkpoint for raising the "cycle" error...) |
Notes on other things I tried Correctly identified as cycle (stopped execution): self-edge
self-edge as part of QGraph
a triangle (like rjawesome's QG2)Can switch around subject/object within an edge or swap two edges....and it's still found to be a cycle
a triangle with a line sticking outCan switch around subject/object within an edge or swap two edges....and it's still found to be a cycle
square (like rjawesome's QG1)Can switch around subject/object within an edge or swap two edges....and it's still found to be a cycle
one of the issue's examples: another squareFrom this and this, I looked at the PK in ARAX and wrote up this QGraph
one of andrew's examples (another square)From internal lab slack:
Correctly identified as not a cycle (continued execution): "cross" not-a-triangle (like rjawesome's QG4)look carefully, it's actually n0 -> n3 <- n1 <- n2 (aka linear)
rjawesome's QG4 looks like this. Also linear once you trace the path C2 and C3 from Dec2021 demo aren't cycles (see that they are executing, then stop them since they run for a while) |
For the Disease ID <-> Disease query, my code works on the principle that direction doesn't matter, so n0 -> n1 is considered the same as n1 -> n0. If this is an issue one way to handle it would be to flag an error if there are "duplicate" edges (where n0 -> n1 and n1 -> n0 are considered equivalent). Alternatively, this could be treated as a special case. Side Note. I fully ran that query and it seems to end in an error
From other testing, this seems to be the error that comes up when cycles are passed into the results assembler, so I believe this case does need to be handled. Side Note 2. The actual reason it seems to be giving an error is that it expects there to be a node with one edge. Therefore, I think the best course of action is to throw an error if there are duplicate edges (where direction doesn't matter) |
I agree that this is a "duplicated edge" issue that ends up with QGraph-cycle-like problems! do you think a fix to this can bundled into this PR / issue, or should it be something separate? |
Since the code would basically be in the same place and it causes a similar error, I can probably put it into this PR and add an additional test for that case. Also, Do you think it would be preferred to bundle this in the _validateCycle or create a separate function for validating duplicate edges? |
It's probably worth making it a separate function. |
New commit should fix that query, and includes an additional test |
looks good to me now! @tokebe (again, I didn't fully execute anything, I only started querying and stopped execution if it looked like it passed the checkpoint for raising the "cycle" error...) |
Meant to solve this issue. Uses a DFS to find cycles and assumes the direction of edges doesn't matter.