Dataset downloaded from http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz Extract and place the dataset to the repo's folder Run: python stem_calculation.py python feature_model_creation.py