Watch here, incoming MLEs.

How to crush the MLE coding interview

2 min readApr 17, 2020

For real, I was asked 3 times. In order to crash the MLE coding interview, let’s do some over preparation:

coding up an MLP by using numpy to classify the MNIST similar dataset within 200 minutes
Design a Keras or Pytorch like deep learning library.

Two links of resources for reference. The first one contains a detailed explanation about the deep learning system. The second one contains a GitHub repo about how to build a deep learning library from Joel Grus. He even set up living coding streams on youtube.

Michael Nielsen Neural networks and Deep Learning(NNDL)

Joel Grus Let’s Build a Deep Learning Libray.

Talk is cheap, let’s code them up. Coding is the medicine for coronavirus.

In the future posts of this series, I will add more details to record how I finish.

15 minutes coding of a simple two-layer MLP

The code is given in this link.

30 minutes coding of a good MLP

Take this post as a reference.

Outcomes of learning Mr. Grus’s autograd library

Learn a simple function

Code

from autograd.tensor import Tensor
import numpy as np
from typing import List
import matplotlib.pyplot as plt
N = 100
x: Tensor = Tensor(np.random.randn(N,3))
unknow_w: Tensor = Tensor(np.array([1,2,3]))
unknow_b: Tensor = Tensor(5.0)
y: Tensor = x@unknow_w + unknow_b
w: Tensor = Tensor(np.random.randn(*(unknow_w.shape)), requires_grad=True)
b: Tensor = Tensor(np.random.randn(*(unknow_b.shape)), requires_grad=True)
lr: float = 0.001
batch_size: int = 8
epoches: int = 20losses: List[float] = []
starts: List[int] = list(range(0, N, batch_size))
for i in range(epoches):
    for s in starts:
        w.zero_grad()
        b.zero_grad()
        e: int = s + batch_size
        this_x: Tensor = x[s:e]
        this_y: Tensor = y[s:e]
        y_hat = this_x@w + b
        loss: Tensor = ((y_hat-this_y)*(y_hat-this_y)).sum()
        loss.backward()
        w -= lr*w.grad
        b -= lr*b.grad
        losses.append(loss.data)
        print(loss.data)
plt.plot(losses)
plt.xlabel("iterations (batch size=8)")
plt.ylabel("loss")
plt.title('Training Loss')
plt.savefig('loss.png')
plt.show()print(f"predicted w: {w}")
print(f"actural w: {unknow_w}")
print(f"predicted b: {b}")
print(f"actural b: {unknow_b}")

Results

Learned parameters

predicted w: Tensor([1.00272977 1.99190731 2.87266225], requires_grad=True)
actural w: Tensor([1 2 3], requires_grad=False)
predicted b: Tensor(4.941449551718003, requires_grad=True)
actural b: Tensor(5.0, requires_grad=False)

Pretty good, right?

Reference

https://victorzhou.com/