Credit_Risk_Analysis

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, we needed to employ different techniques to train and evaluate models with unbalanced classes. Jill asks us to use imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling.

The purpose:

-Use Resampling Models to Predict Credit Risk

Use the SMOTEENN algorithm to Predict Credit Risk
Use Ensemble Classifiers to Predict Credit Risk
A Written Report on the Credit Risk Analysis (README.md)

Resources used:

LoanStats_2019Q1.csv credit_risk_resampling_starter_code.ipynb and credit_risk_ensemble_starter_code.ipynb.

Applications used:

Jupyter Notebook

Algorithms used:

-Resampling Models -Ensemble Classifiers -SMOTEEN algorithm -Random Forest Classifier SMOTE Algorithm

Use Resampling Models to Predict Credit Risk:

Evaluated three machine learning models by using resampling to determine which is better at predicting credit risk. We used the oversampling RandomOverSampler and SMOTE algorithms, and then used the undersampling ClusterCentroids algorithm. Using these algorithms, resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report. *

-Balanced accuracy score: 65% -high risk: 0.01% -low risk: 1% -recall high risk: 63% -recall low risk: 67%

Use the SMOTE Algorithm to Predict Credit Risk:

We used a combinatorial approach of over- and undersampling with the SMOTEE algorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms. Using the SMOTEE algorithm, we resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report.

-Balanced accuracy score: 79% -Precision high risk: 0.1% -Precision low risk: 1% -recall high risk: 64% -recall low risk: 66%

Random Forest Classifier:

-Balanced accuracy score: 79% -Precision high risk: 0.4% -Precision low risk: 1% -recall high risk: 67% -recall low risk: 91%

Use the SMOTEEN algorithm:

Use the SMOTEEN Algorithm to Predict Credit Risk: We used a combinatorial approach of over- and undersampling with the SMOTEEN algorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms. Using the SMOTEENN algorithm, we resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report.

-Balanced accuracy score: 61% -Precision high risk: 0.1% -Precision low risk: 1% -recall high risk: 69% -recall low risk: 55%

Use Ensemble Classifiers to Predict Credit Risk:

using imblearn.ensemble library, trained and compared two different ensemble classifiers, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk and evaluated each model. Using both algorithms,resampled the dataset, viewed the count of the target classes, trained the ensemble classifier, calculated the balanced accuracy score, generated a confusion matrix, and generated a classification report.

-Balanced accuracy score: 92% -Precision high risk: 7% -Precision low risk: 1% -recall high risk: 91% -recall low risk: 94%

Summary on the Credit Risk Analysis

Algorithms used were: -Resampling Models -Ensemble Classifiers -SMOTEEN algorithm -Random Forest Classifier SMOTE Algorithm

Among all the given algorithm Ensemble classifers using imblearn.ensemble library, which has the accuracy rate of 92%. but also has the high recall of 91%. all the algorithms have higher recall risk. When working with balanced accuracy, the highest compared accuracy between 0 and 1 and is closest to 1 is the best machine learning model. Hence, this algorithm is recommened.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
.DS_Store		.DS_Store
.gitignore		.gitignore
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble_code.ipynb		credit_risk_ensemble_code.ipynb
credit_risk_resampling_code.ipynb		credit_risk_resampling_code.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

The purpose:

Resources used:

Applications used:

Algorithms used:

Use Resampling Models to Predict Credit Risk:

Use the SMOTE Algorithm to Predict Credit Risk:

Random Forest Classifier:

Use the SMOTEEN algorithm:

Use Ensemble Classifiers to Predict Credit Risk:

Summary on the Credit Risk Analysis

About

Releases

Packages

Languages

utsavchaudharygithub/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

The purpose:

Resources used:

Applications used:

Algorithms used:

Use Resampling Models to Predict Credit Risk:

Use the SMOTE Algorithm to Predict Credit Risk:

Random Forest Classifier:

Use the SMOTEEN algorithm:

Use Ensemble Classifiers to Predict Credit Risk:

Summary on the Credit Risk Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages