For this dataset:
Main task: ML
- Do a preliminary data analysis. (EDA)
- Create visualizations using seaborn and matplotlib: dependency plots, histograms, boxplot). Find the parameter with the highest correlation
- Based on the analysis, draw some conclusions from the data
- Pre-process the data if necessary
- Create a model for this dataset. (you can use xgboost, lightgbm, catboost)
- Use hyperparameter tuning, justify your choice
- Display the metrics for the resulting model (the main metric is the confusion matrix)
Optional: Deploy
- Create a separate python project
- Write an API that will accept client data and return model prediction (use fastapi)
- tutorial on the site to help
- Raise the database (sqlite, mysql)
- PEP 8 hello
Estimated time to complete the task:
- model creation - 1 day
- creating a web application without knowledge of the framework 2 days
If possible, comments on the work will be given, which will need to be taken into account. Additional time 1 day
- Python3.8.2
- Pip
- Poetry (Python Package Manager)
MODEL_PATH=./ml/model/
MODEL_NAME=model.pkl
To update your machine learning model, add your load
and method
change here at predictor.py
python -m venv venv
source venv/bin/activate
make install
make run
make deploy
make test
Files related to application are in the app
or tests
directories.
Application parts are:
app
├── api - web related stuff.
│ └── routes - web routes.
├── core - application configuration, startup events, logging.
├── models - pydantic models for this application.
├── services - logic that is not just crud related.
├── datasets - datasets for training and testing.
├── visualizations - visualizations for dataset preview.
├── notebooks - notebooks for data analysis and visualization.
└── main.py - FastAPI application creation and configuration.
│
tests - pytest