Customer Churn Prediction

Introduction

The repository contains the code for the Customer Churn Prediction project. The project is divided into two parts:

Machine Learning Pipeline
Web Application

The Machine Learning Pipeline is used to train the model and save it in a pickle file. The Web Application is used to deploy the model and make predictions.

Setup

Clone the repo

git clone https://github.com/suryanshgupta9933/Customer-Churn-Prediction.git

Create a virtual environment
```
python -m venv env
```
Activate the virtual environment
```
env\Scripts\activate
```
Install the dependencies
```
pip install -r requirements.txt
```

Machine Learning Pipeline

1. Data Collection and Optimization.

The dataset is provided in the repository as an excel file under data folder.
The data is first optimized by changing the data types of the columns to reduce the memory usage.
The excel file is converted to a parquet file to reduce the file size.

2. Data Preprocessing.

There were no missing values in the dataset.
There were no outliers in the dataset.
The categorical columns were encoded using Label Encoder.
The dataset was split into train and test sets in the ratio 80:20.

Note: There was very low correlation between the features and the target variable. This is the reason why the models are not performing well.

3. Feature Engineering and Scaling.

Three new features were created:
- Avg_Usage_per_Month = Total_Usage_GB / Subscription_Length_Months
- Bill_to_Usage_Ratio = Monthly_Bill / Total_Usage_GB
- Age_to_Usage_Ratio = Age / Total_Usage_GB
The distribution of target variable was checked and it was already balanced.
The features were scaled using Standard Scaler.

4. Model Training.

The following models were trained:
- Logistic Regression
- Random Forest Classifier
- CatBoost Classifier
- LightGBM Classifier
Each model showed a very similar performance with an accuracy of around 50%.
I went ahead with the Logistic Regression model in the pipeline as it was the simplest model and was also performing well.

Note: The dataset is probably synthetic and not the actual representation of the real world data. This is the reason why the models are not performing well.

5. Model Evaluation.

The model was evaluated using the following metrics:
- Accuracy
- Precision
- Recall
- F1 Score
- ROC AUC Score

5. Hyperparameter Tuning.

The hyperparameters were tuned using Grid Search CV.
The best parameters were used to train the model again.

6. Model Logging.

MLFlow was used to log the model parameters and metrics.
The model was saved in a pickle file.
The final model was saved in the saved_model folder.

Web Application

Streamlit was used to create the web application. The web application is used to deploy the model and make predictions through an intiutive UI.

Running the Web Application

Activate the virtual environment
```
env\Scripts\activate
```
Run the following command
```
streamlit run app.py
```
The web application will open in the browser.

MLFlow UI

MLFlow UI was used to track the model parameters and metrics. The MLFlow UI can be accessed by running the following command:

mlflow ui

The MLFlow UI will open in the browser.
All the experiments, model files, parameters and metrics can be viewed in the UI.

Screenshots

Pipeline Output
Streamlit Web Application

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
data		data
mlruns/0		mlruns/0
pipeline		pipeline
saved_model		saved_model
src		src
utils		utils
LICENSE		LICENSE
README.md		README.md
app.py		app.py
logs.txt		logs.txt
optimize_dataset.ipynb		optimize_dataset.ipynb
playground.ipynb		playground.ipynb
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction

Introduction

Setup

Machine Learning Pipeline

1. Data Collection and Optimization.

2. Data Preprocessing.

3. Feature Engineering and Scaling.

4. Model Training.

5. Model Evaluation.

5. Hyperparameter Tuning.

6. Model Logging.

Web Application

Running the Web Application

MLFlow UI

Screenshots

About

Releases

Packages

Languages

License

suryanshgupta9933/Customer-Churn-Prediction

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction

Introduction

Setup

Machine Learning Pipeline

1. Data Collection and Optimization.

2. Data Preprocessing.

3. Feature Engineering and Scaling.

4. Model Training.

5. Model Evaluation.

5. Hyperparameter Tuning.

6. Model Logging.

Web Application

Running the Web Application

MLFlow UI

Screenshots

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages