Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, we needed to employ different techniques to train and evaluate models with unbalanced classes. Jill asks us to use imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling.
-Use Resampling Models to Predict Credit Risk
- Use the SMOTEENN algorithm to Predict Credit Risk
- Use Ensemble Classifiers to Predict Credit Risk
- A Written Report on the Credit Risk Analysis (README.md)
LoanStats_2019Q1.csv credit_risk_resampling_starter_code.ipynb and credit_risk_ensemble_starter_code.ipynb.
Jupyter Notebook
-Resampling Models -Ensemble Classifiers -SMOTEEN algorithm -Random Forest Classifier SMOTE Algorithm
Evaluated three machine learning models by using resampling to determine which is better at predicting credit risk. We used the oversampling RandomOverSampler and SMOTE algorithms, and then used the undersampling ClusterCentroids algorithm. Using these algorithms, resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report. *
-Balanced accuracy score: 65% -high risk: 0.01% -low risk: 1% -recall high risk: 63% -recall low risk: 67%
We used a combinatorial approach of over- and undersampling with the SMOTEE algorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms. Using the SMOTEE algorithm, we resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report.
-Balanced accuracy score: 79% -Precision high risk: 0.1% -Precision low risk: 1% -recall high risk: 64% -recall low risk: 66%-Balanced accuracy score: 79% -Precision high risk: 0.4% -Precision low risk: 1% -recall high risk: 67% -recall low risk: 91%
Use the SMOTEEN Algorithm to Predict Credit Risk: We used a combinatorial approach of over- and undersampling with the SMOTEEN algorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms. Using the SMOTEENN algorithm, we resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report.
-Balanced accuracy score: 61% -Precision high risk: 0.1% -Precision low risk: 1% -recall high risk: 69% -recall low risk: 55%
using imblearn.ensemble library, trained and compared two different ensemble classifiers, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk and evaluated each model. Using both algorithms,resampled the dataset, viewed the count of the target classes, trained the ensemble classifier, calculated the balanced accuracy score, generated a confusion matrix, and generated a classification report.
-Balanced accuracy score: 92% -Precision high risk: 7% -Precision low risk: 1% -recall high risk: 91% -recall low risk: 94%
Algorithms used were: -Resampling Models -Ensemble Classifiers -SMOTEEN algorithm -Random Forest Classifier SMOTE Algorithm
Among all the given algorithm Ensemble classifers using imblearn.ensemble library, which has the accuracy rate of 92%. but also has the high recall of 91%. all the algorithms have higher recall risk. When working with balanced accuracy, the highest compared accuracy between 0 and 1 and is closest to 1 is the best machine learning model. Hence, this algorithm is recommened.