Reference Paper : https://arxiv.org/pdf/1910.12574.pdf
This project's aim is to train a general social media hate-speech classifier and compare multiple models for hate-speech detection. We trained these models on raw (social media) text to classify it between two classes hate_speech and not-hate-speech.
We use multiple flavours of BERT, such as BERTweet, DistilBERT, BERT Base and RoBERTa, and adapt them to our problem of hate-speech classification.
As a part of our baseline experiments, we use FastText and compare it with other models to see which one performs the best for this particular domain of NLP.
Finally, we used the above mentioned models to classify text from an unseen social media dataset with text from a different dataset from what our model has seen during training. We used this unseen dataset to check how well these models generalize over different variations of social-media text.
-
We use several pre-trained models, specifically
FastText
,BERTweet
,DistilBERT
,BERT Base (Uncased)
andRoBERTa
, and adapt them to our problem of hate-speech classification -
As a part of our baseline experiments, we use
FastText
and compare it with different flavors of BERT to see which one performs the best for this particular domain of NLP. -
Finally, we use the above mentioned models to classify text from an unseen social media dataset with text from a different dataset from what our model has seen during training. We use this unseen dataset to check how well these models generalize over different variations of social-media text.
The interaction among users on different social media platforms generate a vast amount of data. Users of these platforms often indulge in detrimental, offensive and hateful behavior toward numerous segments of the society. While hate speech is not a recent phenomenon, the recent surge in online forums allows perpetrators to more directly target their victims. In addition, hate speech may polarize public opinion and hurt political discourse, with detrimental consequences for democracies. Therefore, the primary motivation for this project is an effort to build an automated mechanism to detect and filter such speech and create a safer, more user-friendly environment for social media users. In an attempt to do this, we use multiple pre trained models and train them to classify text from any social media platform.
- Detecting hate speech is a complicated task from the semantics point of view. Moreover, when it comes to middle- and low-resource domains, the research in hate speech detection is almost insignificant due to the lack of labeled data. This has resulted in the emergence of bias in technology.
- Further, the models trained on text from one social media platform, such as twitter, tend not to work too well on texts from other platforms such as Facebook and YouTube.
- To address these issues, we fine tune our pre trained BERT models on our downstream task (hate speech detection) for online social media data.
We use two datasets for our tasks:
These two datasets are readily available:
-
The first one is available on github. It's in raw for so it needs pre-processing.
-
The second one is available publicly on huggingface and can be acquired using the
datasets
library. It's slightly processed but still needs more pre-processing.
Have a peek at the data description here.
An Exploratory Data Analysis of the datasets can be found here
Note: To replicate the results in data_description.ipynb:
-
You need to download this dataset, and place the file
labeled_data.csv
indata/github
folder. -
You need to install the datasets package, as this is the source of one of our datasets.
From the results of EDA as shown above, we can clearly see that the text tagged as contains 'hate speech' use many abhorrent terms which the text without hatespeech don't contain. This gives an indication of the kind of data the models are trained on. This also shows that the BERT model's general vocabulary is not enough to detect hate speech in social media domain and we would need to fine tune those models wrt to specific domain data.
1. Computing infrastructure
Computing infrastructure includes personal computer and Google Colab.
2. DL-NLP methods
We use transfer learning by fine-tuning pre-trained models like BERT, RoBERTa and BERTweet on the dataset for hate speech classification. To compare how well these models perform, FastText
was set as baseline.
2.1 FastText
FastText was chosen as a baseline as it's a linear model that's very easy and quick to train with good instant good results. Since it's only a linear model, the more advanced models which are going to be tried should perform better because of their more sofisticated understanding of language. It will be difficult for FastText to perform well since comments share a lot of vocabulary regardless of their category.
2.2 BERTweet and BERTweet large
In the first variant, it was decided to use transfer learning by training the entire pre-trained BERTweet model on the dataset. Smaller model of BERTweet bertweet-base
(135M parameters), which was trained on 850M tweets, to see the baseline that we can get with our data. Then, we proceeded with larger model of BERTweet bertweet-large
, which was trained on 873M English Tweets and has 355M parameters.
2.3 DistilBERT
We included DistilBERT as one of the models in our bucket as this is a smaller, faster and computationally cheaper version of Vanilla BERT, while still retaining over 95% of its performance. Since it takes less time to run, we could train it for higher number of epochs (for now, we trained our model for 10 epochs, but we will increase the number once we tune the parameters). Furthermore, DistilBERT might potentially perform even better than the other variants of BERT (it already gave a weighted F1-score of 0.76, which is pretty good), so it is definitely worth a shot comparing it with other models. We referred this tutorial for implementing this model.
2.4 BERT Base
In this method, we use 'bert-base-uncased' as the pre trained BERT model and then fine tune with a hate speech social media data set. Extracted embeddings from BERTbase have 768 hidden dimensions. As the BERT model is pretrained on general corpora, and for our hate speech detection task we are dealing with social media content, therefore as a crucial step, we have to analyze the contextual information extracted from BERT’ s pretrained layers and then fine-tune it using annotated datasets. By fine-tuning we update weights using a labeled dataset that is new to an already trained model. As an input and output, BERT takes a sequence of tokens in maximum length 512 and produces a representation of the sequence in a 768-dimensional vector.
2.5 RoBERTa
RoBERTa is different from base BERT as it is trained differently and has been shown to perform better than base BERT in benchmarks. It is also used in the UC Berkeley dataset, so we'd like to see how it will perform in our dataset.
Four different implementations of RoBERTa will be tried:
3. Framework
We use PyTorch
as our primary framework. Our models include pre-trained FastText
and different variations of pretrained BERT
from the HuggingFace
library.
4. Grid search
For all models, grid search will be conducted in a random fashion to save time.
If there are many sub-types of models to try (e.g. large, base), the parameters of the models will be kept frozen, the apparent winner will be kept for more extensive grid search.
4.1 FastText
Since FastText is faster to train, a very extensive range of parameters were tried:
epoch
: [10-200]lr
(Learning rate): [0.00001 - 2]wordNgrams
: [1-5]dim
(embedding dimensions) : [25 - 300]ws
(context window) : [1-20]
4.2 BERT models
BERT models are longer to train so the range of parameters is smaller:
LEARNING_RATE
: [3e-5 - 7e-5]BATCH SIZE
: [8 - 64]
As the research grows in the field of Hate-speech detection on social media platforms (e.g., in SemEval-2019, one of the major tasks was classifying Twitter data as either hateful or not hateful), many researchers have increasingly shifted focus toward applying Deep Learning models for this task. As a basis for our project, we referred to the following papers:
-
Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application This paper descibes the dataset we decided to use in details. The paper also shows the methods they used to train on that data. We decided to make this a classification problem, but the authors of the paper wanted to put hatespeech on an intensity scale and made it a regression problem. The paper. They also added intermediate outputs to their architecture that they used to predict the final results.
-
A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media This paper talks about a transfer learning approach using the pre-trained language model BERT learned on General English Corpus (no specific domain) to enhance hate speech detection on publicly available online social media datasets. They also introduce new fine-tuning strategies to examine the effect of different embedding layers of BERT in hate speech detection. Different layers of a neural network can capture different levels of syntactic and semantic information. The lower layer of the BERT model may contain more general information whereas the higher layers contain task-specific information. In this paper, they have tried multiple architectures by adding non-linear layers, Bi-LSTM layers and CNN layers after which these results are compared to baseline score.
-
Hate speech detection on Twitter using transfer learning This paper shows that multi-lingual models such as
XLM-RoBERTa
andDistil BERT
are largely able to learn the contextual information in tweets and accurately classify hate and offensive speech. -
BERTweet: A pre-trained language model for English Tweets BERTweet is the first public largescale pre-trained language model for English Tweets. This paper shows that BERTweet outperforms strong baselines RoBERTabase and XLM-Rbase, producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. The model uses the BERTbase model configuration, trained based on the RoBERTa pre-training procedure. The authors used an 80GB pre-training dataset of uncompressed texts, containing 850M Tweets (16B word tokens), where each Tweet consists of at least 10 and at most 64 word tokens.
In this section, we present discussion on our results obtained from different models. For the purpose of acquiring some baseline benchmark results on the dataset, we have used following models:
This is the baseline we decided to use to compare other models. The reason we chose the FastText classifier is because it's a simple fast to train linear model. The data used to train the model (without hatespeech = 0):
Train data size: 31652
Test data size: 7913
With 50 epochs and a learning rate of 0.01 gave:
precision recall f1-score support
__label__no 0.78 0.88 0.83 5270
__label__yes 0.68 0.50 0.58 2643
accuracy 0.75 7913
macro avg 0.73 0.69 0.70 7913
weighted avg 0.75 0.75 0.74 7913
We are using ucberkeley-dlab_measuring-hate-speech as our dataset. Our dataset was normalized (translating emotion icons into text strings, converting user mentions and web/url links into special tokens @USER and HTTPURL) with internal BERTweet normalizer. Also, we kept only two categories: hate speech (1) - 46021 tweets and not hate speech (0) - 80624 tweets. Then, we split the data into train, dev, test with following size:
Train data size: 31652
Test data size: 3957
Dev data size: 3956
The bertweet-base
model was run for 5 epochs. With this model we manage to get:
Precision Score: 65.778
Recall Score: 62.195
F1 Score: 63.936
The bertweet-large
model was run for 5 epochs, but the best scores were achieved with epoch 1:
precision recall f1-score support
0 0.82 0.91 0.86 2621
1 0.77 0.62 0.69 1335
accuracy 0.81 3956
macro avg 0.80 0.76 0.77 3956
weighted avg 0.81 0.81 0.80 3956
The DistilBERT model was trained for 10 epochs. These are the initial results we obtained:
precision recall f1-score support
0 0.81 0.85 0.83 2665
1 0.66 0.60 0.62 1292
accuracy 0.77 3957
macro avg 0.73 0.72 0.73 3957
weighted avg 0.76 0.77 0.76 3957
The BERTBase model was trained for 3 epochs. These are the initial results we obtained:
precision recall f1-score support
0 0.86 0.78 0.82 2665
1 0.62 0.74 0.68 1292
accuracy 0.77 3957
macro avg 0.74 0.76 0.75 3957
weighted avg 0.78 0.77 0.77 3957
4 different implementations of RoBERTa were tried with the following paramters:
BATCH_SIZE
: 8EPOCHS
: 1LEARNING_RATE
: 5e-5
Training time: 12:32 minutes
Avg validation loss: 0.45553952923627816
Precision Score: 71.80616740088107
Recall Score: 63.57254290171607
F1 Score: 67.43897393462971
Training time : 03:59 minutes
Avg validation loss: 0.42029443046739023
Precision Score: 74.81060606060606
Recall Score: 61.62246489859594
F1 Score: 67.57912745936697
Training parameters: 36:23
Avg validation loss: 0.641249378763064
Precision Score: 0.0
Recall Score: 0.0
F1 Score: 0.0
Training time: 03:59 minutes
Avg validation loss: 0.42139427556145576
Precision Score: 72.97297297297297
Recall Score: 63.18252730109204
F1 Score: 67.7257525083612
DistilRoBERTa gives the best results, here is the classification report:
precision recall f1-score support
0 0.83 0.90 0.86 2674
1 0.75 0.62 0.68 1282
accuracy 0.81 3956
macro avg 0.79 0.76 0.77 3956
weighted avg 0.80 0.81 0.80 3956
Optimized with hyperparameters:
BEST EPOCH
: 1 (out of 3)BATCH_SIZE
: 64LEARNING_RATE
: 5e-5
For all the models below, we used the same dataset with the same data split :
Train Data Size : 31652
Test Data Size : 3957
Validation Data Size : 3956
Model Name | Recall | Precision | F1 Score |
---|---|---|---|
FastText (Baseline) | 50 | 68 | 58 |
BERTweet | 62.195 | 65.778 | 63.936 |
BERTweet_large | 62 | 77 | 69 |
DistilBERT | 66 | 60 | 62 |
BERT Base | 62 | 74 | 68 |
DistilRoBERTa | 62 | 75 | 68 |
Training pre-trained vinai/bertweet-base
on training set of 31 652 tweets was compute-intensive. As a result, we did training with only 5 epochs to get the baseline results.
Training this model was mostly straightforward. There were a few instances we were stuck at. We initially didn't consider the face that our dataset contained around 40k unique texts, most of which were annotated by different annotators, and were present in the dataset multiple times. Thus, our dataset consisted of multiple non-unique texts on which we trained all the models, only to get inflated F1-scores and realise that later. However, we fixed this blunder, retrained all the models and updated our results. Apart from that, adapting the existing code from different sources and tutorials for our task proved to be somewhat challenging since this is the first time we were dealing with training models on pre-trained embeddings. However, we successfully finished our checkpoints before submission of this milestone.
In this method, we use 'bert-base-uncased' as the pretrained BERT model and then add CNN layer to the architecture as part of the fine tuning technique. The outputs of all transformer encoders are concatenated and a matrix is produced. The convolution operation is performed and the maximum value is generated for each transformer encoder by applying max pooling on the convolution output. By concatenating these values, a vector is generated which is given as input to a fully connected network. We then apply softmax on the input to get the final classification output.
We use F1-scrore as the major metric to evaluate our model. We compare the F1-scores of different models on cross-platform unseen data. The one which gives the best score is the best-suited for classification of generalized social-media text.
Models are ranked based on their ability to predict hate speech better (have a better F1-score for the hatespeech class).
Our goal materializes from the fact that social media, being a widely used mode to socialize, has become unsafe for people looking for a secure environment to communicate. We come up with an efficient Deep Learning model to detect hate speech in online social media domain data using by fine tuning different variations of BERT pretrained model. This will become a useful tool to filter out any offensive and detrimental content across the social media platforms, even the ones which our model has never seen, and safeguard people from usage of hate speech.