Why SGD is better than GD

Jimmy (xiaoke) Shen
1 min readMar 12, 2020

--

1 Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010 (pp. 177–186). Physica-Verlag HD.

[2] Ge, R., Huang, F., Jin, C., & Yuan, Y. (2015, June). Escaping From Saddle Points-Online Stochastic Gradient for Tensor Decomposition. In COLT (pp. 797–842).

Dominic Masters, Carlo Luschi, Revisiting Small Batch Training for Deep Neural Networks, arXiv:1804.07612v1

--

--

No responses yet