Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download additional DATASETS AND TESTING RESOURCES mentioned in README #139

Open
Deepankar-98 opened this issue Aug 19, 2022 · 3 comments
Open

Comments

@Deepankar-98
Copy link

Deepankar-98 commented Aug 19, 2022

From where can I download the additional DATASETS AND TESTING RESOURCES (items 4-12): mentioned in the README file?
https://github.com/cjhutto/vaderSentiment#resources-and-dataset-descriptions
image
I tried to download the resources using nltk.download('name') but it didn't work the mentioned file names are not there in NLTK Corpura (https://www.nltk.org/nltk_data/)

I am trying to download:

  1. tweets_anonDataRatings.txt,
  2. amazonReviewSnippets_anonDataRatings.txt, etc

Can someone help me with this?

@cjhutto
Copy link
Owner

cjhutto commented Aug 19, 2022

Check out the "additional_resources" directory in this repo. The complete set of resources is compressed into the .tar.gz file for your convenience.

@Deepankar-98
Copy link
Author

Thanks a lot for the info and the wonderful package.

@Deepankar-98
Copy link
Author

Deepankar-98 commented Sep 1, 2022

Hi @cjhutto,

I downloaded the additional datasets but I am unable to figure out how to use it.
I figured that I can select the file to access using this code:

from nltk.sentiment.vader import SentimentIntensityAnalyzer sid_mod = SentimentIntensityAnalyzer (lexicon_file="vader_lexicon download path")

The content inside vader_lexicon.txt is of the form:
image

Whereas tweets_annonDataRatings.txt is:
image

And tweets_GroundTruth.txt is:
image

This 2 appear to be just dataset and rating of 20 people. I have 2 questions:

  1. The mean valence between the 2 files are different. Can you please clarify on that?
  2. Is there any way I can use this for sentiment analysis? If Yes then how?

Your help is much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants