This project was created for machine learning course. Our task was to detect texts with fake data.
Karolina Mączka
Tymoteusz Urban
Fake News Dataset Combined Different Sources
- NaNs and outliers
- Language detection
- Stopwords removal
- Words lemmatizer
- CountVectorizer
- TfidfTransformer
- XGBoost
- Hyperparameter optimization with Random Search CV
- Independent validation
- 0.99 AUC on test set