Skip to content

tangid/clustering-benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clustering benchmarks

Datasets

This project contains collection of labeled clustering problems that can be found in the literature. Most of datasets were artificially created.

The benchmark includes:

Artificial data

2d-10c 2d-20c-no0 2d-3c-no123 2d-4c-no4 2d-4c-no9 2d-4c 2sp2glob 3-spiral 3MC D31 DS577 DS850 R15 aggregation atom banana birch-rg1 birch-rg2 birch-rg3 chainlink cluto-t4.8k cluto-t5.8k cluto-t7.10k cluto-t8.8k complex8 complex9 compound cure-t0-2000n-2D cure-t1-2000n-2D cure-t2-4k curves1 curves2 dartboard1 dartboard2 dense-disk-3000 dense-disk-5000 diamond9 disk-1000n disk-3000n disk-4000n disk-4500n disk-4600n disk-5000n disk-6000n donut1 donut2 donut3 donutcurves ds2c2sc13 ds3c3sc6 ds4c2sc8 elliptical_10_2 elly-2d10c13s engytime flame fourty golfball hepta insect jain long1 long2 long3 longsquare lsun mopsi-finland mopsi-joensuu pathbased rings s-set1 s-set2 s-set3 s-set4 sizes1 sizes2 sizes3 sizes4 sizes5 smile1 smile2 smile3 spherical_4_3 spherical_5_2 spherical_6_2 spiral spiralsquare square1 square2 square3 square4 square5 st900 target tetra triangle1 triangle2 twenty twodiamonds wingnut xclara zelnik1 zelnik2 zelnik3 zelnik4 zelnik5 zelnik6

Experiments

This project contains set of clustering methods benchmarks on various dataset. The project is dependent on Clueminer project.

in order to run benchmark compile dependencies into a single JAR file:

mvn assembly:assembly

Consensus experiment

allows running repeated runs of the same algorithm:

./run consensus --dataset "triangle1" --repeat 10

by default k-means algorithm is used.

For available datasets see resources folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 99.2%
  • Other 0.8%