Skip to content

[Classifier] Classifying Reddit comments into 20 different subreddits by applying different classifiers.

Notifications You must be signed in to change notification settings

hli988/Reddit-Comment-Classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reddit Text Classification

In this project, we considered the problem of classifying the comments into 20 different subreddits by applying classifiers, including Logistic Regression(LR), Decision Trees, Support Vector Machines, Bernoulli Naive Bayes’, and Multinomial Naive Bayes’. We also experimented with Recurrent Neural Network with the Keras library run on the TensorFlow backend. We investigated how changing the parameters and preprocessing texts can affect the performances of the classifiers. In the end, with the help of Gridsearch for hyperparameter tuning, and model validation pipeline, we found that among all of the classifiers, Support Vector Machines performs the best, while the Decision trees method performs the worst in our environment.

  • Training and testing data are stored in the folder: data.
  • Several models are stored in the folder: models
    • Bernouli Naive Bayes
    • Recurrent Neural Network
    • Support Vector Machine
    • Decision Tree
    • Logistic Regression

About

[Classifier] Classifying Reddit comments into 20 different subreddits by applying different classifiers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%