Is it beneficial to use a batch size > 1 even when all computing power can be used?

Question

Is it beneficial to use a batch size > 1 even when all computing power can be used?

wispi

2022年5月25日 23:44

In regards to training a neural network, it is often said that increasing the batch size decreases the network's ability to generalize, as alluded to here. This is due to the fact that training on large batches causes the network to converge to sharp minimas, as opposed to wide ones, as explained here.

This begs the question: In situations where all available computing power can be used by training on a batch size of one, is there a benefit to using a batch size greater than one?

A situation like this would likely occur when training on a CPU, or when training a very large network on any hardware.

Topic training gradient-descent neural-network optimization machine-learning

Category Data Science

Is it beneficial to use a batch size > 1 even when all computing power can be used?

About