Book Review Sentiment Analysis Project
Developed python-based machine learning model for predicting the sentiment of book reviews, specifically classifying them as positive or negative. Best-performing model using the logistic regression algorithm achieved an impressive AUC score of 83%.
To accomplish this, the following steps:
Data Preparation: Ensured no null values. Balanced data of positive and negative samples by undersampling negative instances. Split the data into 80% training and 20% testing.
Model Selection: Explored multiple machine learning algorithms, including K-Nearest Neighbors (KNN), Decision Trees, and Logistic Regression, to determine the best model for the sentiment analysis task.
Text Transformation: Transformed the raw text of book reviews into word embeddings, which served as feature representations for the machine learning models.
Model Evaluation: To assess the performance of our models, employed ROC AUC (Receiver Operating Characteristic Area Under the Curve) analysis, a robust method for evaluating classification models.