You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The compound score has a serious flaw – it diverges for long sequences. Example:
polarity_scores('bad good')
>>> {'neg': 0.547, 'neu', 0.0, 'pos': 0.453, 'compound': -0.1531}
polarity_scores('bad good bad good bad good bad good bad good bad good bad good bad good bad good bad')
>>> {'neg': 0.547, 'neu', 0.0, 'pos': 0.453, 'compound': -0.8979}
It seems, the 'neg' and 'pos' scores are averages, whereas the 'compound' score is some sort of a sum. Thus, the compound score always takes on extreme values for long sequences, like Reddit posts or news articles.
This is particularly unfortunate, since a lot of beginners will blindly use the compound score without noticing this and get discouraged by the poor results. I suggest replacing the current implementation of the compound score with compound = pos – neg or completely removing it.
The text was updated successfully, but these errors were encountered:
The compound score has a serious flaw – it diverges for long sequences. Example:
It seems, the 'neg' and 'pos' scores are averages, whereas the 'compound' score is some sort of a sum. Thus, the compound score always takes on extreme values for long sequences, like Reddit posts or news articles.
This is particularly unfortunate, since a lot of beginners will blindly use the compound score without noticing this and get discouraged by the poor results. I suggest replacing the current implementation of the compound score with compound = pos – neg or completely removing it.
The text was updated successfully, but these errors were encountered: