You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1/ In the section "Fast tokenizer special powers" of the chapter on Tokenizer, it is mentionned that the model that is loaded (dbmdz/bert-large-cased-finetuned-conll03-english) has been finetuned on a dataset following IOB1 format, that is, for two adjacent entities of the same type, the second one start with B- rather than I-.
It seems to me that this model does not work that way.
I've tried many times, with several entities, the entities always get tagged I-.
E.g : Screenshot below with locations.
Same with Persons :
Thus the example mentionned in the course does not work that way. Cf: (should be like in blue below)
2/ The piece of code meant to group entities at the end of that section (screenshot below) has an issue too.
Following the supposed behaviour (second entity starts with B-) an entity starting with B- would be instantly ejected from the while loop, thus losing all the rest of the following tokens starting with I-. There's most probably an issue in the iteration of the idx in the while loop.
The text was updated successfully, but these errors were encountered:
It's a double issue :
1/ In the section "Fast tokenizer special powers" of the chapter on Tokenizer, it is mentionned that the model that is loaded (dbmdz/bert-large-cased-finetuned-conll03-english) has been finetuned on a dataset following IOB1 format, that is, for two adjacent entities of the same type, the second one start with B- rather than I-.
It seems to me that this model does not work that way.
I've tried many times, with several entities, the entities always get tagged I-.
E.g : Screenshot below with locations.
Same with Persons :
Thus the example mentionned in the course does not work that way. Cf: (should be like in blue below)
2/ The piece of code meant to group entities at the end of that section (screenshot below) has an issue too.
Following the supposed behaviour (second entity starts with B-) an entity starting with B- would be instantly ejected from the while loop, thus losing all the rest of the following tokens starting with I-. There's most probably an issue in the iteration of the idx in the while loop.
The text was updated successfully, but these errors were encountered: