Implementation of the following algorithms:
- K-Means initialization (9 different methods) [1]
- Random Swap algorithm [2]
- Genetic Algorithm [3]
This package is also needed as a dependency to be able to build some of the projects in the UEF organization (like Random Swap)
Compile: make cbkm
Example with 15 clusters:
./cbkm -O -S15 datasets/s1.txt output.txt ./cbkm -O -S15 datasets/s1.ts output.ts Steinley initialization method, two repeats:
./cbkm -I5 -O -R2 -S15 datasets/s1.txt output.txt K-means++ initialization method, two repeats: ./cbkm -I2 -O -R2 -S15 datasets/s1.txt output.txt
Run ./cbkm for help:
CBKM Version 0.65 4.4.2017 Repeated K-means algorithm. Usage: CBKM [-option] <training set> [initial cb/pa] <result codebook> For example: CBKM bridge initial tmp Options: -Bn = Sample percentage 10000/(1..10000) (0..10000, default=1000) -Hn1,n2 = n1: Hybrid methods combining K-means 0 = None (default) 1 = Kmeans+DensityPeaks n2: Hybrid method variant (0..10000, default=1) -I[n1,n2,..n4] = n1: Initialization method 0 = DensityPeaks 1 = Random (default) 2 = K-means++ 3 = Bradley 4 = Projection 5 = Luxburg 6 = MaxMin 7 = Random partition (Forgy) 8 = Splitting 9 = Sorting heuristic n2: Initialization method variant (Maxmin-variant: 1=rand point first, 2=mean first) (0..10000, default=1) n3: Initialization method option (0..10000, default=0) n4: Initialization method (third) option (0..10000, default=0) -O = Overwrite existing file (default=NO) -P = Save partition file (default=NO) -Qn = Quiet level (0..99, default=2) -Rn = Number of repeats (0..1000000, default=1) -Sn = Number of clusters (1..5000, default=256) -Tn = Iterations, 0=INF (0..50000000, default=0) -Zn = Random seed (0=clock) (0..2147483647, default=0) Compile: make cbrs
Example with 15 clusters: ./cbrs -O -S15 datasets/s1.txt output.txt
Run ./cbrs for help
Compile: make cbga
Example with 15 clusters: ./cbga -O -S15 datasets/s1.txt output.txt
Run ./cbga for help
Debug Compile with debug flags: make cbkm DEBUG=-g
[1] P Fränti, S Sieranoja, "How much can k-means be improved by using better initialization and repeats?", Pattern Recognition, 93, 95-112, 2019. https://doi.org/10.1016/j.patcog.2019.04.014
[2] P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018. (pdf) JF=1 https://doi.org/10.1186/s40537-018-0122-y
[3] P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pattern Recognition Letters, 21 (1), 61-68, 2000. (pdf) IF=1.03 https://doi.org/10.1016/S0167-8655(99)00133-6