When should one use L1, L2 regularization instead of dropout layer, given that both serve same purpose of reducing overfitting?

In Keras, there are 2 methods to reduce over-fitting. L1,L2 regularization or dropout layer.

What are some situations to use L1,L2 regularization instead of dropout layer? What are some situations when dropout layer is better?

Topic overfitting keras dropout regularization

Category Data Science


I am unsure there will be a formal way to show which is best in which situations - simply trying out different combinations is likely best!

It is worth noting that Dropout actually does a little bit more than just provide a form of regularisation, in that it is really adding robustness to the network, allowing it to try out many many different networks. This is true because the randomly deactivated neurons are essentially removed for that forward/backward pass, thereby giving the same effect as if you had used a totally different network! Have a look at this post for a few more pointers regarding the beauty of dropout layers.

$L_1$ versus $L_2$ is easier to explain, simply by noting that $L_2$ treats outliers a little more thoroughly - returning a larger error for those points. Have a look here for more detailed comparisons.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.