Issue with NER model with adjacent entities of the same type. #764

ThomasBourgeois · 2024-12-04T21:57:45Z

It's a double issue :

1/ In the section "Fast tokenizer special powers" of the chapter on Tokenizer, it is mentionned that the model that is loaded (dbmdz/bert-large-cased-finetuned-conll03-english) has been finetuned on a dataset following IOB1 format, that is, for two adjacent entities of the same type, the second one start with B- rather than I-.

It seems to me that this model does not work that way.

I've tried many times, with several entities, the entities always get tagged I-.

E.g : Screenshot below with locations.

Same with Persons :

Thus the example mentionned in the course does not work that way. Cf: (should be like in blue below)

2/ The piece of code meant to group entities at the end of that section (screenshot below) has an issue too.
Following the supposed behaviour (second entity starts with B-) an entity starting with B- would be instantly ejected from the while loop, thus losing all the rest of the following tokens starting with I-. There's most probably an issue in the iteration of the idx in the while loop.

ThomasBourgeois · 2024-12-04T22:07:44Z

@sgugger might be able to tag the good people ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with NER model with adjacent entities of the same type. #764

Issue with NER model with adjacent entities of the same type. #764

ThomasBourgeois commented Dec 4, 2024 •

edited

Loading

ThomasBourgeois commented Dec 4, 2024

Issue with NER model with adjacent entities of the same type. #764

Issue with NER model with adjacent entities of the same type. #764

Comments

ThomasBourgeois commented Dec 4, 2024 • edited Loading

ThomasBourgeois commented Dec 4, 2024

ThomasBourgeois commented Dec 4, 2024 •

edited

Loading