This project is aimed at sentiment analysis of tweets using Natural Language Processing (NLP) techniques implemented in PyTorch. The model classifies tweets into positive and negative sentiment categories. It employs tokenization and stemming techniques using the NLTK library to preprocess the tweets.
The current model architecture consists of the following layers:
- Embedding Bag Layer: Operates in a 120-dimensional space.
- ReLU Activation Function: Non-linear activation function.
- Fully Connected (FC) Layer 1: Converts embedding output into a 30-dimensional vector.
- ReLU Activation Function: Non-linear activation function.
- Fully Connected (FC) Layer 2: Relates its input to two classes (positive and negative sentiment).
- Sigmoid Activation Function: Outputs probability scores for each sentiment class.
The initial accuracy achieved with the current model architecture is 76% for the dataset [1].
We are actively engaged in enhancing the accuracy of the model by integrating more advanced architectures such as LSTM and Attention mechanisms.
- Python 3.x
- PyTorch
- torchtext
- NLTK
- NumPy
- Pandas
- Seaborn
- scikit-learn
- Clone the repository.
- Install the required dependencies using
pip install -r requirements.txt
. - Preprocess your dataset using NLTK for tokenization and stemming.
- Train the model using the Train section of the notebook.
- Evaluate the model using the Evaluate section of the notebook.
- Experiment with different architectures and hyperparameters to improve accuracy.
[1] Sentiment140 dataset with 1.6 million tweets (https://www.kaggle.com/datasets/kazanova/sentiment140)