Why SGD is better than GD

1 min readMar 12, 2020

Why mini batch size is better than one single "batch" with all training data?

begingroup$ The key advantage of using minibatch as opposed to the full dataset goes back to the fundamental idea of…

datascience.stackexchange.com

1 Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010 (pp. 177–186). Physica-Verlag HD.

[2] Ge, R., Huang, F., Jin, C., & Yuan, Y. (2015, June). Escaping From Saddle Points-Online Stochastic Gradient for Tensor Decomposition. In COLT (pp. 797–842).

Dominic Masters, Carlo Luschi, Revisiting Small Batch Training for Deep Neural Networks, arXiv:1804.07612v1

Why SGD is better than GD

Why mini batch size is better than one single "batch" with all training data?

begingroup$ The key advantage of using minibatch as opposed to the full dataset goes back to the fundamental idea of…

Written by Jimmy (xiaoke) Shen

No responses yet