# Simple implement K means by using NumPy

K means is a classical algorithm. I’d like to have a quick implementation of the K means algorithm in case that you are supposed to implement it within 10 minutes such as in an interview.

The hard part is related to NumPy broadcast.

A quick summary of the broadcast rule will be from [1]:

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

they are equal, or

one of them is 1

Some examples can be shown here:[1]

`A (2d array): 5 x 4`

B (1d array): 1

Result (2d array): 5 x 4

A (2d array): 5 x 4

B (1d array): 4

Result (2d array): 5 x 4

A (3d array): 15 x 3 x 5

B (3d array): 15 x 1 x 5

Result (3d array): 15 x 3 x 5

A (3d array): 15 x 3 x 5

B (2d array): 3 x 5

Result (3d array): 15 x 3 x 5

A (3d array): 15 x 3 x 5

B (2d array): 3 x 1

Result (3d array): 15 x 3 x 5

# Get started

generate 5 points with 2 dimension features.

`>>> points =np.random.randn(5, 2)`

>>> idx = np.arange(5)

>>> np.random.shuffle(idx)

>>> idx

array([4, 2, 0, 1, 3])

>>> centroids = points[idx[:3]]

>>> centroids

array([[-0.92043547, 1.94704196],

[ 1.32088275, -0.52664638],

[ 2.18007867, -0.35835555]])

>>> centroids.shape

(3, 2)

>>> points.shape

(5, 2)

>>> centroids.shape

(3, 2)

The full code can be found in my github.

# Clustering results

Reference

[1]https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html