Numerical Cruncher



Clustering


Similarity Matrix



The main disadvantage of most heuristic algorithms is that the clustering results obtained with them are dependent on the presentation ordering of the pattern samples. Graph theoretic approaches (and also GRASP algorithms) are suggested to overcome this inherent limitation of heuristic algorithms. However, the computational costs associated with graph theoretic methods make them inapplicable to many real-world problems.

Similarity Matrix

The similarity matrix is used to show the degree of similarity between a variety of patterns. A symmetric NxN matrix is built, where N is the number of patterns in the training set. The element S[i,j] tells whether the pair of samples i and j are closer than a distance threshold theta. S[i,j] are binary such that only one bit of storage is required per cell.

Similarity matrix-based clustering algorithm

While there are patterns to assign

  1. Choose the row i of S with most 1's (the choice is arbitrary if there is more than one such row).
  2. Form a cluster with all the patterns j corresponding to 1's in row i (S[i,j]=1).
  3. Add pattern k to the cluster if S[j,k]=1 and pattern j is already in the cluster. Repeat this step until no more patterns can be assigned to the actual cluster.
  4. Reduce the similarity matrix: Remove from S all columns and rows corresponding to patterns in the recently created cluster.