Anomaly-Detection-Using-K-means-Clustering

Anomaly Detection using K-means clustering is to detect the outlier points in the dataset that should not belong to any cluster.

K means clustering is dividing the given dataset into clusters based on the calculated cluster centroids. The datapoints are then assigned to the cluster with minimum distance from the cluster centroid. The clustering is well suited for datasets with gaussian distribution. The optimum number of clusters for a dataset is calculated using the Silhouette coefficient method.

After clustering, anomalies are identified by

The small clusters with less than a threshold (1% of total number of data points)
Isolation data points not belong to any cluster
A data point belongs to a cluster with more than 2 standard deviations (i.e., 95% confidence).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
README.md		README.md
anomaly_detection_k_means_clustering.ipynb		anomaly_detection_k_means_clustering.ipynb
data1.dat		data1.dat
data2.dat		data2.dat
data3.dat		data3.dat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly-Detection-Using-K-means-Clustering

About

Releases

Packages

Languages

gprashmi/Anomaly-Detection-Using-K-means-Clustering

Folders and files

Latest commit

History

Repository files navigation

Anomaly-Detection-Using-K-means-Clustering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages