This repository contains a Jupyter notebook that demonstrates the implementation of text classification using the Naive Bayes algorithm. The notebook is implemented using Python and popular libraries such as Pandas, NumPy, and Scikit-Learn.
The dataset used in this project is the 20 Newsgroups dataset, which contains approximately 20,000 newsgroup posts, partitioned into 20 different categories. The goal of this project is to build a machine learning model that can accurately classify these newsgroup posts into their respective categories.
To run the implementations, you will need to have Python 3 installed on your machine. You will also need to install the following libraries:
- NumPy
- Pandas
- Matplotlib
- Sklearn
- NLTK
You can install these libraries using pip. For example, to install NumPy, you can run the following command:
pip install nltk
Once you have installed the required libraries, you can clone this repository to your local machine using Git. To do this, run the following command:
git clone https://github.com/reeba212/Text-Classification-Naive-Bayes
To run the notebook, navigate to the project directory in your terminal and run the following command:
jupyter notebook
This will open the Jupyter Notebook interface in your web browser. From here, you can open the notebook and run the cells to train and test the model.
After training the model on the newsgroups dataset, I achieved an accuracy of over 87% on the test set. This demonstrates that the model is effective at classifying articles in the newsgroups dataset.
This project provides a practical example of text classification using the Naive Bayes algorithm. By studying the notebook, you can gain a deeper understanding of how the algorithm works and how it can be applied to real-world problems. With this knowledge, you can extend the implementation or use it as a starting point for your own projects.