Are there studies which examine dropout vs other regularizations?

Question

Are there studies which examine dropout vs other regularizations?

Martin Thoma

2016年6月2日 06:53

Are there any papers published which show differences of the regularization methods for neural networks, preferably on different domains (or at least different datasets)?

I am asking because I currently have the feeling that most people seem to use only dropout for regularization in computer vision. I would like to check if there would be a reason (not) to use different ways of regularization.

Topic dropout regularization convnet computer-vision neural-network

Category Data Science

Ricardo Magalhães Cruz · Accepted Answer · 2016年6月2日 06:53

Two points:

Dropout is also usually compared with neural networks ensembles. It seems it has some of the performance benefits of training and averaging several neural networks.
Dropout is easier to calibrate than regularization. There is only one hyperparameter which is the dropout rate and people widely use 0.5 while training (and then 1.0 on evaluation of course :)), see e.g. this TensorFlow example.

Anyhow, I am a little skeptical of neural networks empirical studies. There are just too many hyperparameters to fine tune, from the topology of the network to the gradient descent optimization procedure to activation functions and whatever it is you are testing like regularization. Then, the entire thing is stochastic and usually performance gains are so small that you can hardly statistical test for differences. Many authors do not even bother doing statistical testing. They just average cross-validation and declare whatever model had the highest decimal point gain to be the winner.

You may find a study promoting dropout only to be contradicted by another promoting regularization.

I think it all boils down to aesthetics preferences. Dropout IMHO sounds more biological plausible than regularization. It also seems easier to calibrate. So, I personally prefer it when using a framework like TensorFlow. If we have to use our own neural network, which we often do, we will use regularization because it was easier to implement.

Amanuel Negash · Accepted Answer · 2016年4月7日 18:46

1

Amanuel Negash answered at 2016年4月7日 18:46

Definitely. The paper from the Creator himself, Geoffrey Hinton. https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf read it. But I encourage u to see the difference by yourself implementing it.

Are there studies which examine dropout vs other regularizations?

About