Why SGD is better than GD
1 min readMar 12, 2020
1 Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010 (pp. 177–186). Physica-Verlag HD.
[2] Ge, R., Huang, F., Jin, C., & Yuan, Y. (2015, June). Escaping From Saddle Points-Online Stochastic Gradient for Tensor Decomposition. In COLT (pp. 797–842).
Dominic Masters, Carlo Luschi, Revisiting Small Batch Training for Deep Neural Networks, arXiv:1804.07612v1