Develop a model to predict the churn of Uber customer based on their behavioral data.
Uber customer behavioral data.
-
Data cleaning
1.1 Remove invalid and duplicated cases
1.2 Deal with missing data
a) Fill missing categorical entries with new value - 'Missing value'
b) Imputation of missing customer rating with the average rating of subgroup of customer in the training dataset -
Feature engineering
2.1 Generation of feature 'weekend ride', 'weekday ride', 'average spending per ride' -
Model development
3.1 Linear regression model as the baseline model
3.2 Random forest regression model
3.3 Gradient boosting regression model
The final gradient boosting model has accuracy score of 0.79, precision score of 0.81, and recall score of 0.86.
The ROC curve.
The model revealed that features with high impact on customer churn are the rating of the customer and driver, distance of the ride, percentage of the surge ride, and promotion period.
Feature importance