Numerical Cruncher

Clustering

K-MEANS

K-Means algorithm (J.B. MacQueen, 1967) is probably the most popular clustering algorithm. It is a heuristic clustering algorithm where the number of classes K must be known apriori (at least a rough idea of the number of clusters should be available). This algorithm is based on the minimization of the sum of squared distances from all patterns in a cluster to the cluster center.

The algorithm is simple and efficient. The pattern samples are processed sequentially without the necessity of being stored (minimal storage requirements). However, the performance of this algorithm is influenced by the number of cluster centers initially chosen (K) and also by the order in which pattern samples are passed through to the system (in fact, the first patterns determine the initial cluster configuration and, thus, the local optimum found by the algorithm).