Monday, January 3, 2011

Kmeans Clustering Algorithm (Part 1)

Algorithm:
  • Step1: 
    • Choose the number of clusters ("K" in the KMeans) required. Lets say K=3.
    • Choose the centroids for the clusters randomly from among the given set of elements
  • Step2: 
    • Compute the distance of each of the element from the centroids
    • Assign the element to the cluster with the closest centroid
  • Step3:
    • Re-compute the cluster centroids based on the assignments done in step2
  • Step4:
    • Repeat Step 2 and Step 3  until the algorithm converges


Convergence:
Each time we compute step 2, we keep track of the number of elements that change their cluster assignments. For example, if an element belonging to the red cluster changes its assignment from red to blue cluster, we count that as one change, again if an element changes its cluster assignment from blue to red, we consider that as another change and so on.. For convergence, we have a minimum threshold for the number of changes ( say 10 changes). If after step2, the number of cluster assignment changes is less than the threshold, we say the algorithm has converged.

No comments: