Heart disease clusterization

This project was created for Machine Learning course at the Warsaw University of Technology. Our task was to cluster anonymized data of patients who have been diagnosed with heart disease to help doctors understand which treatments might work with their patients as patients with similar characteristics may respond to the same treatments.

Authors

Tymoteusz Urban
Karolina Mączka

Data

Heart Disease patients

Data exploration

We did a thorough study of the data in order to fully understand the dataset we have been working with. We also asked medical expert to give us medical insights and help us with more specific interpretation of every feature. Data was well prepared, all columns were numerical and there were no null values.

Preprocessing

At first we wanted to delete outliers but after consultation with our validation team we abandoned this idea as outliers are important in medical data analysis. After many tests we decided to apply MinMax scaler and reduce dimensionality with PCA.

Model

We obtained optimal number of clusters from Silhouette method. After testing multiple clustering algorithms we have chosen KMeans - the most common algorithm yet the most effective. It got the best results in almost all metrics. We also checked the clustering on 3D visualizations of principal components:

Interpretation

We created random forest model in order to extract feature importance for each cluster. By analyzing means, medians, boxplot charts and feature importance we could create descriptions of each cluster. To learn more about the results and whole process of clusterization check out our presentation. For more technical insights open jupyter notebook (beware that some visualizations may not load automatically).

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
validation		validation
Evaluation.py		Evaluation.py
KMeans.py		KMeans.py
dbscan.py		dbscan.py
metrices.py		metrices.py
other_algorithms.py		other_algorithms.py
preprocessing.py		preprocessing.py
project_notebook.ipynb		project_notebook.ipynb
project_presentation.pdf		project_presentation.pdf
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart disease clusterization

Authors

Data

Data exploration

Preprocessing

Model

Interpretation

About

Releases

Packages

Languages

tymsoncyferki/heart-disease-clusterization

Folders and files

Latest commit

History

Repository files navigation

Heart disease clusterization

Authors

Data

Data exploration

Preprocessing

Model

Interpretation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages