We predict whether a Kickstarter project proposal succeeds or fails to meet the fund-raising objective by only providing information from the project launch by means of 220, 000 project proposals scraped from Kickstarter. We evaluate the performance for these predictions of different machine learning models based on the project category, the fundraising objective and short product descriptions.
Sample Data: Download from here.
The Baseline Model we selected is Logistic Regression, so in that model we achieved Precision_score:- 0.9853, Recall_score:- 0.966, f1_Score:- 0.9759 and AUC Score:- 0.9828. We also plot fpr vs tpr and in that Train AUC:- 0.9792 and Test AUC:- 0.9796 and we also check Confusion Matrix for train and test both. The Two Performance Model are selected are Decision Tree and Gradient Boosting Decision Tree —
- Decision Tree with Hyperparameter tuning(GridSearchCv):
- Using Decision Tree with GridSearchCv we found Best Parameters which are Max_depth = 10 and Max_Sample_split = 100.
- Then we plot heatmap with best parameter using groupby of max_depth and max_sample_split
- By using this parameter we found root_mean_square_error:- 0.07 and accuracy_score:- 0.9779
- We plot fpr vs tpr and in that Train AUC:- 0.998 and Test AUC:- 0.997 and we also check Confusion Matrix for train and test both.
- Gradient Boosting with Hyperparameter tuning(GridSearchCv):
- Using GBDT with GridSearchCv we found Best Parameters which are learning_rate = 1 and Max_depth= 3.
- Then we plot heatmap with best parameter using groupby of max_depth and learning_rate
- By using this parameter we found root_mean_square_error:- 0.07 and accuracy_score:- 0.9771
- We plot fpr vs tpr and in that Train AUC:- 0.998 and Test AUC:- 0.998 and we also check Confusion Matrix for train and test both.
Python
Jupyter Notebook
Google Colab
Streamlit
Flask
GitHub
GitBash
pandas
numpy
sklearn
matplotlib
seaborn
streamlit
nltk