Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add token classification eval with CoNLL 2003 #92

Closed

Conversation

tylerjthomas9
Copy link

Changes

This PR adds support for CoNLL 2003 token classification/entity recognition. It should be easier to integrate other token classification datasets now that the classes have been built out.

Using the overall_f1 metric from seqeval, here are the HF and Mosaic BERT ablations:

  • HF BERT: 90.51
  • Mosaic BERT: 60.92

I trained a quick checkpoint of Flex BERT and verified that this also ran without errors, and got a score of 64.28.

Here are the

Discussions
I am not aware of any discussions on the topic, but the BertForTokenClassification class was left as TBD.

class BertForTokenClassification(BertPreTrainedModel):

Tests

  • Is the new feature tested? (Not always necessary for all changes -- just adding to the checklist to keep track)
  • Have you ran all the tests?
  • Do the tests all pass?
  • If not, have you included an explanation of which tests this PR breaks and/or why (below this checklist)

@bclavie
Copy link
Contributor

bclavie commented Jul 19, 2024

Hey! Thanks for adding this. As discussed, we won't be merging outside evals right now (as we finalised the training) runs but we'll be revisiting this shortly after.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants