EDA-predictor

For this dataset:

Main task: ML

Do a preliminary data analysis. (EDA)
Create visualizations using seaborn and matplotlib: dependency plots, histograms, boxplot). Find the parameter with the highest correlation
Based on the analysis, draw some conclusions from the data
Pre-process the data if necessary
Create a model for this dataset. (you can use xgboost, lightgbm, catboost)
Use hyperparameter tuning, justify your choice
Display the metrics for the resulting model (the main metric is the confusion matrix)

Optional: Deploy

Create a separate python project

Write an API that will accept client data and return model prediction (use fastapi)

tutorial on the site to help

Raise the database (sqlite, mysql)
PEP 8 hello

Estimated time to complete the task:

model creation - 1 day
creating a web application without knowledge of the framework 2 days

If possible, comments on the work will be given, which will need to be taken into account. Additional time 1 day

Development Requirements

Python3.8.2
Pip
Poetry (Python Package Manager)

M.L Model Environment

MODEL_PATH=./ml/model/
MODEL_NAME=model.pkl

Update `/predict`

To update your machine learning model, add your load and method change here at predictor.py

Installation

python -m venv venv
source venv/bin/activate
make install

Runnning Localhost

make run

Deploy app

make deploy

Running Tests

make test

Access Swagger Documentation

http://localhost:8080/docs

Access Redocs Documentation

http://localhost:8080/redoc

Project structure

Files related to application are in the app or tests directories. Application parts are:

app
├── api              - web related stuff.
│   └── routes       - web routes.
├── core             - application configuration, startup events, logging.
├── models           - pydantic models for this application.
├── services         - logic that is not just crud related.
├── datasets         - datasets for training and testing.
├── visualizations   - visualizations for dataset preview.
├── notebooks        - notebooks for data analysis and visualization.
└── main.py          - FastAPI application creation and configuration.
│
tests                  - pytest

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
app		app
datasets		datasets
ml/model		ml/model
notebooks		notebooks
tests		tests
visualizations		visualizations
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDA-predictor

Development Requirements

M.L Model Environment

Update `/predict`

Installation

Runnning Localhost

Deploy app

Running Tests

Access Swagger Documentation

Access Redocs Documentation

Project structure

About

Languages

License

bnutfilloyev/Classificator

Folders and files

Latest commit

History

Repository files navigation

EDA-predictor

Development Requirements

M.L Model Environment

Update /predict

Installation

Runnning Localhost

Deploy app

Running Tests

Access Swagger Documentation

Access Redocs Documentation

Project structure

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Update `/predict`