**1 Answer**

written 2.5 years ago by | • modified 2.5 years ago |

**K-means Clustering:**

In K-means approach the data objects are classified based on their attributes or features into k number of clusters. The number of clusters i.e K is an input given by the user. K-means is one of the simplest unsupervised learning algorithms.

**K-means Algorithm:**

K- number of clusters

n- sample vectors $x_{1},x_{2},....x_{n}$

$m_{i}$- The mean of vectors in cluster i

- Assume k < n
- Make initial guesses for the means $m_{1}, m_{2},...m_{k}$
- Until there are no changes in any mean
Use the estimated means to classify the samples into clusters.

for i = 1 to k

Replace $m_i$ with the mean of all of the samples for cluster i

end_for

end_until

Following three steps are repeated until convergence:

- Iterate till no object moves to a different group:
- Find the centroid coordinate.
- Find the distance of each object to the centroids
- Based on minimum distance group the objects.

**K-Means Advantages :**

If variables are huge, then K-Means most of the times computationally faster than hierarchical clustering, if we keep k smalls.

K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular.

**K-Means Disadvantages :**

- Difficult to predict K-Value.
- With global cluster, it didn't work well.
- It does not work well with clusters (in the original data) of Different size and Different density.

**Numerical**

Given:

Dataset -{15, 15, 16, 19, 19, 20, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65}

Solution:

Assume K=2

So, two clusters are formed (Users point of view) i.e

$Cluster 1$ = 15, 16, 19, **20**, 21, 22, 28
$\therefore C_1$ = 20 (mean value)

$Cluster 2$ = 35, 40, 41, 42, **43**, 44, 60, 61, 65
$\therefore C_2$ = 43 (mean value)

Below table:

Calculate Cluster 1 and Cluster 2 values, by subtracting the datapoint from $C_1$ and $C_2$

i.e (20 - 15) = 5 and (43 - 15) = 28

then choose min value (5) and assign that cluster.

Data point | Cluster 1 | Cluster 2 | Cluster assign |
---|---|---|---|

15 | 5 | 28 | 1 |

15 | 5 | 28 | 1 |

16 | 4 | 27 | 1 |

19 | 1 | 24 | 1 |

19 | 1 | 24 | 1 |

20 | 0 | 23 | 1 |

20 | 0 | 23 | 1 |

21 | 1 | 22 | 1 |

22 | 2 | 21 | 1 |

28 | 8 | 15 | 1 |

35 | 15 | 8 | 2 |

40 | 20 | 3 | 2 |

41 | 21 | 2 | 2 |

42 | 22 | 1 | 2 |

43 | 23 | 0 | 2 |

44 | 24 | 1 | 2 |

60 | 40 | 17 | 2 |

61 | 41 | 18 | 2 |

65 | 45 | 22 | 2 |

New centroid is calculated by taking average value of clusters.

Old centroid | New centroid |
---|---|

20 | 19.5 |

43 | 47.88 |

Data point | Cluster 1 | Cluster 2 | Cluster assign |
---|---|---|---|

15 | 4.5 | 32.8 | 1 |

15 | 4.5 | 32.8 | 1 |

16 | 3.5 | 31.8 | 1 |

19 | 0.5 | 28.8 | 1 |

19 | 0.5 | 28.8 | 1 |

20 | 0.5 | 27.8 | 1 |

20 | 0.5 | 27.8 | 1 |

21 | 1.5 | 26.8 | 1 |

22 | 2.5 | 25.8 | 1 |

28 | 8.5 | 19.8 | 1 |

35 | 15.5 | 12.8 | 2 |

40 | 20.5 | 7.8 | 2 |

41 | 21.5 | 6.8 | 2 |

42 | 22.5 | 5.8 | 2 |

43 | 23.5 | 4.8 | 2 |

44 | 24.5 | 3.8 | 2 |

60 | 40.5 | 12.1 | 2 |

61 | 41.5 | 13.1 | 2 |

65 | 45.5 | 17.1 | 2 |

Old centroid | New centroid |
---|---|

19.5 | 19.5 |

47.88 | 47.88 |

Since, Old centriod is same as New centroid stop the iteration.

$\therefore$ Final answer is $k_1$ = {15,15,16,19,19,20,20,21,22,28} $k_2$ = {35,40,41,42,43,44,60,61,65}