Numerical Cruncher

Clustering

ISODATA

ISODATA, an acronym for Iterative Self-Organizing Data Analysis Techniques (the A being added to make the word pronounceable), is an iterative clustering method. As the sequential clustering algorithm, it requires a substantial effort to get a proper setting of all its parameters. Moreover, they can be changed in each iteration of the algorithm.

Parameters

K: Desired number of clusters.
A: Initial number of clusters.
n: Minimum number of patterns per cluster.
s: Maximum standard deviation allowed in a cluster (used in cluster splitting).
c: Distance required between clusters (used in cluster merging).
L: Pairs of clusters that can be lumped per iteration.
I: Maximum number of iterations allowed.

Algorithm

Choose A cluster centers.
For each one of the I algorithm iterations.
1. Set algorithms parameters
2. Assign patterns to their nearest clusters
3. Discard clusters without enough members (n patterns)
4. If there are only a few clusters (<=K/2), split disperse clusters along their maximum dispersion component (if it is greater that the maximum value s).
5. In even iterations or when there are too many clusters (>2K), combine L pairs of clusters at most if the distance between them is smaller than the minimum distance allowed c.

PS: A cluster dispersion is the average distance from its patterns (i.e. the patterns assigned to the cluster) to its centroid.