Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconfirming the evaluation metric #49

Open
thanish opened this issue Dec 16, 2016 · 6 comments
Open

Reconfirming the evaluation metric #49

thanish opened this issue Dec 16, 2016 · 6 comments

Comments

@thanish
Copy link
Contributor

thanish commented Dec 16, 2016

Think It might have already been discussed in #4 but just to reconfirm, what is the evaluation metric for this contest ? it's F1 which is 2 * (precision * recall) / (precision + recall) right ?
and not accuracy which would be (sum of the diagonal of confusion matrix) / total number of test data(or blind data)

@kwinkunks
Copy link
Member

It's this:

f1_score(y_blind, y_pred, average='micro')

Read about it here.

This is the same as the metric provided by Brendon's accuracy() function.

@thanish
Copy link
Contributor Author

thanish commented Dec 18, 2016

Thank for the info @kwinkunks . I was assuming accuracy for so long. Anybody would like to help me with a R package to calculate F1 score for multiclass ??? Highly appreciate it.

@mycarta
Copy link
Contributor

mycarta commented Dec 20, 2016

@thanish
Copy link
Contributor Author

thanish commented Dec 20, 2016

Thanks @mycarta that really helped :)

@dalide
Copy link
Contributor

dalide commented Dec 24, 2016

@kwinkunks I thought the f1 score is using average = 'weighted'.

@AdmcCarthy
Copy link
Contributor

AdmcCarthy commented Jan 24, 2017

@kwinkunks just following up on this before I have submitted and seen my score. ( Will not seem so biased!). Hopefully it´s weighted instead of micro. Micro will give bias to highly populated labels. In this case all facies are equal and there are some heavily skewed distributions of the classes.

Extracted some text below from here:

If you think there are labels with more instances than others and if you want to bias your metric towards the most populated ones, use micromedia.

If you think there are labels with more instances than others and if you want to bias your metric toward the least populated ones (or at least you don't want to bias toward the most populated ones), use macromedia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants