-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: I got different results when I loaded a trained model to do the inference task on the same test data #3478
Comments
Hello @xiulinyang it is hard to tell without a runnable example to reproduce the error. But I would suspect the error lies somewhere in the way you read the sentences. Some ideas:
|
Hi @alanakbik, thanks a lot for your prompt reply! :) I tried to use ColumnCorpus in Flair (as seen below) but the issue remains. Is it possible for me to send you the code, training log and the data for reproduce the error? Or is there anything that I misunderstood in the code below? Thanks!
|
Hi, just a quick update. I tried the other task (upos tagging) but the result is still weird. I created a Google Colab script to replicate the experiment. It will only take 5 minutes to run. I attached the data (it only contains 100 sentences). It would be very much appreciated if you could offer some insight into potential problems. Thank you! :) |
Hi @xiulinyang |
Hi @helpmefindaname, Thank you very much for your reply! Sorry, I have been debugging for a while and the code was a mess. Now I have cleaned the code and you can run the experiment again with the data from the tagger folder . It contains 500 sentences for training and 100 sentences from these 500 examples for test. (I tried to downsample the data, but with smaller size, the model will only give 0 accuracy score). The main problem I have right now is that the model won't give consistent predictions. After training, a Thanks! |
Hi, could you please offer some help? Thanks! |
hi @xiulinyang |
@helpmefindaname Hi, sorry, I have removed the unrelated code and only kept what is relevant to flair. I hope this time it works. Thanks! |
Question
Hi,
I'm training a SequenceTagger to do the NER task using my customed dataset with the customed features. After training was done, I got a file named test.tsv which is the prediction of the test split. However, when I loaded the trained model (final-model.pt) and did inference on the same test data, I got way lower results (0.86 vs 0.64 in accuracy).
Here is the prediction function I'm using. I checked the sents list, and all the labels are correctly added to each token. During training, I stacked all the features - should I do the same during prediction? I find the main issue is that the model does not understand the labels added - I have a specific label named
__TARGET__
which is the signal for the model to give predictions on specific tokens, but it seems that the model ignores the tag. It would be very appreciated for any suggestions. Thanks!The text was updated successfully, but these errors were encountered: