Classify sequence when the spans are known #550

dimidd · 2020-03-16T12:53:37Z

Is your feature request related to a problem? Please describe.
In NER (named entity detection) we sometimes already know the segmentation of the entities, but still need to classify their type. E.g. in the sentece
'Paxar Corp said it has acquired Thermo-Print GmbH'
We might know that 'Paxar Corp' and 'Thermo-Print GmbH' are the relevant entities, but we want to predict their label as ORG. Quoting form Wikipedia:

Full named-entity recognition is often broken down, conceptually and possibly also in implementations,[6] as two distinct problems: detection of names, and classification of the names by the type of entity they refer to (e.g. person, organization, location and other[7]). The first phase is typically simplified to a segmentation problem: names are defined to be contiguous spans of tokens, with no nesting, so that "Bank of America" is a single name, disregarding the fact that inside this name, the substring "America" is itself a name. This segmentation problem is formally similar to chunking. The second phase requires choosing an ontology by which to organize categories of things.

Describe the solution you'd like
Perhaps add an optional param named spans to SequenceLabeler.predict, which is a list of dictionaries. Each dictionary will contain the start and end indices.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classify sequence when the spans are known #550

Classify sequence when the spans are known #550

dimidd commented Mar 16, 2020

Classify sequence when the spans are known #550

Classify sequence when the spans are known #550

Comments

dimidd commented Mar 16, 2020