This dataset was scraped from Google Play Store and stored in Kaggle:
https://www.kaggle.com/lava18/google-play-store-apps
The dataset includes 9360 apps and 14 features after final cleaning:
-
App name
-
Category
-
Rating
-
Reviews
-
Size
-
Installs
-
Type
-
Price
-
Content Rating
-
Current Ver
-
Android Ver
-
Year
-
Month
-
Day
I came up with five different business questions and my goal is answer them. Please find questions below:
The target of prediction is a rating column. This is a regression model. I will be predicting continuous variable.
To predict rating, I tired different models like Linear Regression, Random Forest Regressor, and XGBoost Regressor models. After applying those models to my dataset, the best MSE and RMSE were produced by XGBoost Regressor.
The best iteration of XGBoost Regressor's MSE is 0.1984, and RMSE of 0.4455.