K means Clustering Algorithm

K-means clustering algorithm Kasun Ranga Wijeweera (krw19870829@gmail.com)

What is Clustering? • Organizing data into classes such that there is • high intra-class similarity • low inter-class similarity • Finding the class labels and the number of classes directly from the data (in contrast to classification). • More informally, finding natural groupings among objects.

What is a natural grouping among these objects?

What is a natural grouping among these objects? Clustering is subjective Simpson's Family School Employees Females Males

Defining Distance Measures Definition: Let O1 and O2 be two objects from the universe of possible objects. The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1,O2) Kasun Kiosn 0.23 3 342.7

Consider a Set of Data Points, And a Set of Clusters,

Algorithm k-means 1. Randomly choose K data items from X as initial centroids. 2. Repeat  Assign each data point to the cluster which has the closest centroid.  Calculate new cluster centroids. Until the convergence criteria is met.

K-means gets stuck in a local optima

Applications of K-means Method • Optical Character Recognition • Biometrics • Diagnostic Systems • Military Applications

Comments on the K-Means Method • Strength – Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. – Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms • Weakness – Applicable only when mean is defined, then what about categorical data? – Need to specify k, the number of clusters, in advance – Unable to handle noisy data and outliers – Not suitable to discover clusters with non-convex shapes

K means Clustering Algorithm

More Related Content

What's hot

Viewers also liked

Similar to K means Clustering Algorithm

More from Kasun Ranga Wijeweera

Recently uploaded

In this document

K means Clustering Algorithm