You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HI @kwinkunks, I know it has been one year already, I just happen to take a look at this repo again and found in the utils.py that the score used is "accuracy", not the actual "F1 score", is that right?
The text was updated successfully, but these errors were encountered:
Hi... Indeed, I used the accuracy function in utils.py. This is the same as sklearn.metrics.f1_score with average='micro'.
This was discussed in another issue. It seems that using average='weighted' may have been more sensible for a dataset with such imbalanced labels, but 'micro' is what we went with.
It would be interesting to test them against each other, because one question I have is whether weighting the small populations would make the results rather unstable, as single instances of small populations could substantially change the score. Since I was using 100 realizations of the scores for the final ranking, maybe this would not have been a big concern, but it's worth investigating I think.
I see, I do remember that I saw this discussion somewhere, but couldn't find it last night. I agree that average='weighted would be a better choice for such imbalanced data set, since 'micro' basically gives the accuracy in normal sense.
If some population is too small , I guess their precision and recall would not be weighted too much towards the total.
HI @kwinkunks, I know it has been one year already, I just happen to take a look at this repo again and found in the utils.py that the score used is "accuracy", not the actual "F1 score", is that right?
The text was updated successfully, but these errors were encountered: