Numerical Cruncher
Clustering
ISODATA
ISODATA, an acronym for Iterative Self-Organizing Data
Analysis Techniques (the A being added to make the word
pronounceable), is an iterative clustering method.
As the sequential clustering algorithm, it requires
a substantial effort to get a proper setting of all
its parameters. Moreover, they can be changed in each
iteration of the algorithm.
Parameters
- K: Desired number of clusters.
- A: Initial number of clusters.
- n: Minimum number of patterns per cluster.
- s: Maximum standard deviation allowed in a cluster (used in cluster splitting).
- c: Distance required between clusters (used in cluster merging).
- L: Pairs of clusters that can be lumped per iteration.
- I: Maximum number of iterations allowed.
Algorithm
- Choose A cluster centers.
- For each one of the I algorithm iterations.
- Set algorithms parameters
- Assign patterns to their nearest clusters
- Discard clusters without enough members (n patterns)
- If there are only a few clusters (<=K/2),
split disperse clusters along their maximum dispersion component
(if it is greater that the maximum value s).
- In even iterations or when there are too many clusters (>2K),
combine L pairs of clusters at most
if the distance between them is smaller than the minimum distance
allowed c.
PS: A cluster dispersion is the average distance from
its patterns (i.e. the patterns assigned to the cluster) to its centroid.