Correct Denominators #38

cbizon · 2020-12-11T13:26:11Z

When we are doing the enrichment calculations, we use the type of the coalesced nodes. So if we we are merging on chemicals, we suppose that any chemical could be in that spot, and so we say how likely is it e.g. to have X of those chemicals to have a particular property.

That's not wrong, exactly, but it is probably not specific enough. So for instance consider
(asthma)<-[treats]-(chemical)

We'll usually find an enriched property for the chemical like "drug" or features that tend to be more common in druglike space (like heterocyclic organic compounds). And that is correct, it's more likely than by chance that drugs treat a disease rather than just random chemicals. But it's not terribly interesting.

Instead, I think we'd rather use the denominator of how many chemicals could have inhabited that spot in a graph. So something like, out of all the chemicals with a 'treats' edge, how likely is it that you would have this many with property X. Now the chance of having 'drug' is pretty high in that group, so it's not returned, which is what we want.

That would be doable in this case, and we could precache counts by edge. But in the general case (where there are an arbitrary number of edges coming out of the merging node) then we'd need to actually cache the identities of nodes with each edge so that we could intersect them to find the appropriate denominator.

patrickkwang · 2020-12-14T16:29:10Z

This makes sense. We do something similar with the specificity weighting on edges.

cbizon added Priority: Low Status: Available Type: Enhancement labels Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct Denominators #38

Correct Denominators #38

cbizon commented Dec 11, 2020

patrickkwang commented Dec 14, 2020

Correct Denominators #38

Correct Denominators #38

Comments

cbizon commented Dec 11, 2020

patrickkwang commented Dec 14, 2020