Depression: Twitter Dataset + Feature Extraction
20000 Labelled English Tweets of Depressed and Non-Depressed Users
Link: https://www.kaggle.com/datasets/infamouscoder/mental-health-social-media
The data is in uncleaned format and is collected using Twitter API. The Tweets has been filtered to keep only the English context. It targets mental health classification of the user at Tweet-level.
Data structure:
Post Date | Time of the tweet being published. |
---|---|
Text | The content of this tweet. |
Followers | The number of followers of this account. |
Friends | The number of friends (followed by and following) of this account. |
Favorites | The number of "like"s. |
Statuses | The number of activities of the account owner. |
Retweet | The number of retweets. |
Label | The mental status, whether depression or not. |
- EDA: label distribution, word frequency, text length...
- Statistical Modelling: prediction, clustering...
docker build . -t project
docker run -v $(pwd):/home/rstudio -e PASSWORD=yfd -p 8787:8787 -t project
make clean
make .create-dirs
make data/processed_tweets.csv
make figure/density.png
make figure/negative.html figure/positive.html figure/negative.png figure/positive.png
make figure/follower_month.png figure/follower_year.png figure/freq_month.png figure/freq_year.png
make model/model_lstm.pt figure/loss.png
make result/cm_lstm.png
make result/cm_bnb.png
make report.html
Please check the final report report.html
.