Numerical Cruncher

Clustering

Sequential Clustering

The use of the sequential clustering algorithm is a little more complex than the use of other algorithms (such as K-MEANS or Batchelor & Wilkins') due to the extra effort needed to adjust all its parameters. It is really hard to establish their values apriori.

This clustering algorithm performs easy computations and processes pattern samples sequentially without special storage requirements. Its main drawback is the fact that the clustering results obstained with this algorithm depend heavily on the samples presentation order (as a matter of fact, its behaviour is biased by the first patterns passed to the algorithm).

Parameters

K: Maximum number of clusters.
R: Distance threshold used to form new clusters.
C: Distance threshold used to merge clusters.
M: Batch lentgh (patterns processed between lumping stages)
T: Threshold to discard clusters (la eliminación de agrupamientos (percentage respect to M)

Sequential clustering algorithm

The algorithm begins choosing the first cluster center among all the pattern samples arbitrarily. Then, it processes the remaining the patterns sequentially. It computes the distance from the actual pattern to its nearest cluster center. If it is smaller than or equal to R, the pattern is assigned to its nearesr cluster. If not, a new cluster is formed with the actual pattern.

Every M patterns, clusters are merged using a distance criterion (two clusters are combined into one if the distance between their centroids is below C). If there are still more than K clusters, clusters are lumped together using a size criterion (clusters with less than T% of M patterns are merged with their nearest clusters). If we still have too many clusters, the nearest pairs of clusters are merged until there are exactly K clusters left.

The lumping stage of the algorithm guarantees that we get the desired number of clusters at the end. Remember that too many clusters could be obtained with other algorithms (such as the adaptive method or Batchelor & Wilkins' algorithm).