Skip to content

Machine learning project for clustering patients with heart diseases

Notifications You must be signed in to change notification settings

tymsoncyferki/heart-disease-clusterization

Repository files navigation

Heart disease clusterization

This project was created for Machine Learning course at the Warsaw University of Technology. Our task was to cluster anonymized data of patients who have been diagnosed with heart disease to help doctors understand which treatments might work with their patients as patients with similar characteristics may respond to the same treatments.

Authors

Tymoteusz Urban
Karolina Mączka

Data

Heart Disease patients

Data exploration

We did a thorough study of the data in order to fully understand the dataset we have been working with. We also asked medical expert to give us medical insights and help us with more specific interpretation of every feature. Data was well prepared, all columns were numerical and there were no null values.

Preprocessing

At first we wanted to delete outliers but after consultation with our validation team we abandoned this idea as outliers are important in medical data analysis. After many tests we decided to apply MinMax scaler and reduce dimensionality with PCA.

Model

We obtained optimal number of clusters from Silhouette method. After testing multiple clustering algorithms we have chosen KMeans - the most common algorithm yet the most effective. It got the best results in almost all metrics. We also checked the clustering on 3D visualizations of principal components:

Interpretation

We created random forest model in order to extract feature importance for each cluster. By analyzing means, medians, boxplot charts and feature importance we could create descriptions of each cluster. To learn more about the results and whole process of clusterization check out our presentation. For more technical insights open jupyter notebook (beware that some visualizations may not load automatically).

About

Machine learning project for clustering patients with heart diseases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published