Skip to content

Latest commit

 

History

History
34 lines (34 loc) · 972 Bytes

README.md

File metadata and controls

34 lines (34 loc) · 972 Bytes

Intro

This is a course project of Data Mining (2024, fall) in THU.

Objective

Implementing clustering on the human activity dataset.

Data

6 kinds of activities with 561 features https://archive.ics.uci.edu/dataset/240/human+activity+recognition+using+smartphones

Content

1. data analysis and preprocessing

2. clustering analysis

2.1 algorithms

2.1.1 Partitioning Methods

  • K-means
  • CLARANS

2.1.2 Hierarchical Clustering Methods

  • Agglomerative Hierarchical Clustering
  • Divisive Hierarchical Clustering

2.1.3 Density-Based Clustering Methods

  • DBSCAN

2.1.4 Grid-Based Clustering Methods

2.2 evaluation

2.2.1 clustering tendency

  • Hopkins statistic

2.2.2 clustering quality

  • Internal metrics
    • Compatness
    • Separation
    • lhouette score
    • Calinski-Harabasz index (CH index)
    • avies-Bouldin index (DB index)
  • External metrics
    • Adjusted Rand Index (ARI)
    • Normalized Mutual Information (NMI)
    • V-Measure