The goal of this Lab is to explore some of the main scikit-learn tools on a single practical task: analyzing a collection of text documents (newsgroups posts) on twenty different topics. In this Lab we will see how to:
- Load the file contents and the categories
- Extract feature vectors suitable for machine learning
- Training a classifier
- Building a pipeline
- Parameter tuning using grid search
- Evaluation of the performance on the test set