Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classify sequence when the spans are known #550

Open
dimidd opened this issue Mar 16, 2020 · 0 comments
Open

Classify sequence when the spans are known #550

dimidd opened this issue Mar 16, 2020 · 0 comments

Comments

@dimidd
Copy link
Contributor

dimidd commented Mar 16, 2020

Is your feature request related to a problem? Please describe.
In NER (named entity detection) we sometimes already know the segmentation of the entities, but still need to classify their type. E.g. in the sentece
'Paxar Corp said it has acquired Thermo-Print GmbH'
We might know that 'Paxar Corp' and 'Thermo-Print GmbH' are the relevant entities, but we want to predict their label as ORG. Quoting form Wikipedia:

Full named-entity recognition is often broken down, conceptually and possibly also in implementations,[6] as two distinct problems: detection of names, and classification of the names by the type of entity they refer to (e.g. person, organization, location and other[7]). The first phase is typically simplified to a segmentation problem: names are defined to be contiguous spans of tokens, with no nesting, so that "Bank of America" is a single name, disregarding the fact that inside this name, the substring "America" is itself a name. This segmentation problem is formally similar to chunking. The second phase requires choosing an ontology by which to organize categories of things.

Describe the solution you'd like
Perhaps add an optional param named spans to SequenceLabeler.predict, which is a list of dictionaries. Each dictionary will contain the start and end indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant