Goal of this project is to implement Linear Regression and predict how a restaurant rating can be effected by adding few more features to the restaurant.
Yelp data set can be downloaded from https://www.yelp.com/dataset_challenge/dataset.
Data set size is 3.2GB
1. Linear Regression
We started working with Linear Regression. But the results were not impressive. So decided to move with other algorithms. Results of the Linear Regression can be seen below
('Coefficients: \n', array([-0.25635592, 0.05793933, 0.16467707, 0.18756422, 0.22711118,
0.39648812, 0.27228609, 0.20640004, -0.16721703, 0.20535891,
0.30502822, 0.0651468 , 0.00639236, 0.0436703 , 0.06147295,
-0.03483443, -0.05196656, 0.04772341, -0.12082692, 0.09069207,
0.221569 , 0.04441228, 0.06232851, 0.03228548, -0.05750673,
0.13131649, -0.02851442]))
Residual sum of squares: 0.53
Variance score: 0.06
2. Support Vector Regression
We implemented Support Vector Regression with kernel function as polynomial
and RBF
to explore how those works. For this application, we found that RBF is giving better performance compared to other regression algorithms.