-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentiment analysis laser #274
Sentiment analysis laser #274
Conversation
tasks/SentimentAnalysis/README.md
Outdated
|
||
To run the notebook in Google Colab, simply click the "Open in Colab" button below: | ||
|
||
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12gQUG7rPJvOVeWQkpMFzMiixqwDIdv4W?usp=sharing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you have now two independent notebooks: one in Google Drive (tied to Colab), and another here in Github.
I suggest that instead, we have only one copy of the notebook, the one in Github, and modify the Colab url so that it always loads the version from Github. The url will look like https://colab.research.google.com/github/NIXBLACK11/LASER-fork/blob/Sentiment-analysis-laser/tasks/SentimentAnalysis/SentimentAnalysis.ipynb, only you'll need to update the path in a way that refers the final destination (the main
branch of the LASER
repository).
(I found this trick here)
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"with open('/content/drive/MyDrive/dataset/train.csv', 'rb') as f:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be using Google Drive here, but there is not code above that mounts it. This is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have to add that.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"with open('/content/drive/MyDrive/dataset/train.csv', 'rb') as f:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe add above some text about where and how to download this dataset?
Currently, those who open the notebook directly have no idea where to get it.
"source": [ | ||
"## Step 3: Download the Dataset\n", | ||
"\n", | ||
"Next, let's acquire a sentiment analysis dataset to train our model. We'll download a dataset from Kaggle and unzip it into a directory named ./dataset. Execute the following commands:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What dataset are you using? Can you put a short description and a link to the Kaggle page presenting the dataset?
Also, I see that some credentials are included in the URL; did you use your own credentials for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I used my own credentials for this.
I think I can add just steps on how to download the dataset from kaggle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe that would be better. It's a bit annoying to have to download the dataset, but I suppose your credentials might expire at a certain point and break the notebook. Isn't there another source to download the dataset with no need for credentials?
@avidale @heffernankevin might have ideas about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's maybe use one which we can download from HuggingFace. For this we can use the datasets
library:
python -m pip install datasets
An example could be this dataset: https://huggingface.co/datasets/carblacac/twitter-sentiment-analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I see you're reporting "accuracy" at the end for evaluating the trained model. However earlier you show that the labels in the tweet dataset are not balanced. If you move to another dataset and the labels are balanced then you can stick with accuracy. Otherwise ideally we would should precision and recall per label (your confusion matrix sheds light on this).
} | ||
], | ||
"source": [ | ||
"# Sentiment Prediction with RNN Neural Network and Confusion Matrix\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to normalize the confusion matrix?
…R-fork into Sentiment-analysis-laser merge balanced
I believe you're training the sentiment model on "eng_Latn". When you then try sentiments on other languages, perhaps just mention in the title that this is technically "zero-shot sentiment prediction" for languages other than English. eg. "Step 14: Zero-shot Sentiment Prediction for Multilingual Texts" It's one of the benefits of LASER that such a sentiment model trained only on English should hopefully do well in other languages (even though not explicitly trained on them). You can then also remove your first example in "english" |
No description provided.