Provide multiple coreferent mentions to improve grounding #115

JohnGiorgi · 2021-06-24T18:52:01Z

There are some situations where entity mentions have different surface forms, but should ultimately be grounded to the same ID. This includes acronyms as well as coreferent mentions, e.g.:

CCNU (lomustine) toxicity in dogs. To describe the incidence of hematological, renal, hepatic and gastrointestinal toxicities in tumour-bearing dogs receiving 1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea (CCNU).

I believe it is standard practice to try to ground each mention independently (call it mention-level grounding for lack of a better term). But I can imagine that trying to ground all three mentions (call it entity-level grounding) should reduce ambiguity and therefore increase grounding accuracy.

I am not sure exactly how it would work in grounding-search. I can continue to think about it a little but I would be interested to hear if you guys think this is feasible. I already have a machine learning-based method that identifies mentions and then groups them into coreferent clusters, so it would be able to take advantage of something like this.

The text was updated successfully, but these errors were encountered:

maxkfranz · 2021-06-24T19:53:39Z

It should be possible in one query if you can generate a single main mention (the "best" one or an aggregate) from the cluster. I wonder how well the naive approach of selecting the first mention would work.

An alternative would be to allow you to send a cluster (array) of mentions as input to the service. Even the naive approach for that would at least save network overhead (i.e. do separate queries internally).

JohnGiorgi · 2021-06-24T20:09:23Z

It should be possible in one query if you can generate a single main mention (the "best" one or an aggregate) from the cluster. I wonder how well the naive approach of selecting the first mention would work.

Yes I should have clarified that I can make it work just by choosing one of the mentions. For now, I choose the longest, with the intuition being that its likely to be the least ambiguous. This seems to work pretty well.

An alternative would be to allow you to send a cluster (array) of mentions as input to the service. Even the naive approach for that would at least save network overhead (i.e. do separate queries internally).

Yeah exactly what I was thinking!

Using the above example you could imagine a situation where "CCNU" is ambiguous and brings up multiple hits, but querying with ["CCNU", "lomustine"] or even ["CCNU", "lomustine", "1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea (CCNU)"] reduces the ambiguity and leads to one clear hit. As you said, the naive approach might be enough (separate queries internally).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide multiple coreferent mentions to improve grounding #115

Provide multiple coreferent mentions to improve grounding #115

JohnGiorgi commented Jun 24, 2021 •

edited

Loading

maxkfranz commented Jun 24, 2021 •

edited

Loading

JohnGiorgi commented Jun 24, 2021

Provide multiple coreferent mentions to improve grounding #115

Provide multiple coreferent mentions to improve grounding #115

Comments

JohnGiorgi commented Jun 24, 2021 • edited Loading

maxkfranz commented Jun 24, 2021 • edited Loading

JohnGiorgi commented Jun 24, 2021

JohnGiorgi commented Jun 24, 2021 •

edited

Loading

maxkfranz commented Jun 24, 2021 •

edited

Loading