Please this project is of a student. Just wanted to appreciate for knowledge sharing
The dataset The goal is to predict price
of given diamond (Regression Analysis).
There are 10 independent variables (including id
):
id
: unique identifier of each diamondcarat
: Carat (ct.) refers to the unique unit of weight measurement used exclusively to weigh gemstones and diamonds.cut
: Quality of Diamond Cutcolor
: Color of Diamondclarity
: Diamond clarity is a measure of the purity and rarity of the stone, graded by the visibility of these characteristics under 10-power magnification.depth
: The depth of diamond is its height (in millimeters) measured from the culet (bottom tip) to the table (flat, top surface)table
: A diamond's table is the facet which can be seen when the stone is viewed face up.x
: Diamond X dimensiony
: Diamond Y dimensionx
: Diamond Z dimension
Target variable:
price
: Price of the given Diamond.
Dataset Source Link : https://www.kaggle.com/competitions/playground-series-s3e8/data?select=train.csv
Check this link for details : American Gem Society
AWS Elastic Beanstalk link : http://gemstonepriceutkarshgaikwad-env.eba-7zp3wapg.ap-south-1.elasticbeanstalk.com/
Link for YouTube Video : Click the below thumbnail to open
API Link : http://gemstonepriceutkarshgaikwad-env.eba-7zp3wapg.ap-south-1.elasticbeanstalk.com/predictAPI
-
Data Ingestion :
- In Data Ingestion phase the data is first read as csv.
- Then the data is split into training and testing and saved as csv file.
-
Data Transformation :
- In this phase a ColumnTransformer Pipeline is created.
- for Numeric Variables first SimpleImputer is applied with strategy median , then Standard Scaling is performed on numeric data.
- for Categorical Variables SimpleImputer is applied with most frequent strategy, then ordinal encoding performed , after this data is scaled with Standard Scaler.
- This preprocessor is saved as pickle file.
-
Model Training :
- In this phase base model is tested . The best model found was catboost regressor.
- After this hyperparameter tuning is performed on catboost and knn model.
- A final VotingRegressor is created which will combine prediction of catboost, xgboost and knn models.
- This model is saved as pickle file.
-
Prediction Pipeline :
- This pipeline converts given data into dataframe and has various functions to load pickle files and predict the final results in python.
-
Flask App creation :
- Flask app is created with User Interface to predict the gemstone prices inside a Web Application.
Link : EDA Notebook
Link : Model Training Notebook
Link : LIME Interpretation