# Simple implement K means by using NumPy

K means is a classical algorithm. I’d like to have a quick implementation of the K means algorithm in case that you are supposed to implement it within 10 minutes such as in an interview.

The hard part is related to NumPy broadcast.

A quick summary of the broadcast rule will be from [1]:

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

they are equal, or

one of them is 1

Some examples can be shown here:[1]

`A      (2d array):  5 x 4B      (1d array):      1Result (2d array):  5 x 4A      (2d array):  5 x 4B      (1d array):      4Result (2d array):  5 x 4A      (3d array):  15 x 3 x 5B      (3d array):  15 x 1 x 5Result (3d array):  15 x 3 x 5A      (3d array):  15 x 3 x 5B      (2d array):       3 x 5Result (3d array):  15 x 3 x 5A      (3d array):  15 x 3 x 5B      (2d array):       3 x 1Result (3d array):  15 x 3 x 5`

# Get started

generate 5 points with 2 dimension features.

`>>> points =np.random.randn(5, 2)>>> idx = np.arange(5)>>> np.random.shuffle(idx)>>> idxarray([4, 2, 0, 1, 3])>>> centroids = points[idx[:3]]>>> centroidsarray([[-0.92043547,  1.94704196],       [ 1.32088275, -0.52664638],       [ 2.18007867, -0.35835555]])>>> centroids.shape(3, 2)>>> points.shape(5, 2)>>> centroids.shape(3, 2)`

The full code can be found in my github.

# Clustering results

Reference

--

--

## More from Jimmy (xiaoke) Shen

Data Scientist/MLE/SWE @takemobi