Skip to content

Latest commit

 

History

History
46 lines (40 loc) · 1.98 KB

README.md

File metadata and controls

46 lines (40 loc) · 1.98 KB

Dataset

Depression: Twitter Dataset + Feature Extraction
20000 Labelled English Tweets of Depressed and Non-Depressed Users
Link: https://www.kaggle.com/datasets/infamouscoder/mental-health-social-media

Description

The data is in uncleaned format and is collected using Twitter API. The Tweets has been filtered to keep only the English context. It targets mental health classification of the user at Tweet-level.
Data structure:

Post Date Time of the tweet being published.
Text The content of this tweet.
Followers The number of followers of this account.
Friends The number of friends (followed by and following) of this account.
Favorites The number of "like"s.
Statuses The number of activities of the account owner.
Retweet The number of retweets.
Label The mental status, whether depression or not.

Goal

  1. EDA: label distribution, word frequency, text length...
  2. Statistical Modelling: prediction, clustering...

Manual

Build docker image and run the container

docker build . -t project
docker run -v $(pwd):/home/rstudio -e PASSWORD=yfd -p 8787:8787 -t project

Generate files, results and the report

make clean
make .create-dirs
make data/processed_tweets.csv
make figure/density.png
make figure/negative.html figure/positive.html figure/negative.png figure/positive.png
make figure/follower_month.png figure/follower_year.png figure/freq_month.png figure/freq_year.png
make model/model_lstm.pt figure/loss.png
make result/cm_lstm.png
make result/cm_bnb.png
make report.html

Final Report

Please check the final report report.html.