L1 regularization to first layer or all the layers

I have lots of features in the input to a Fully Connected Neural Network(FCNN) and was thinking to add L1 regularization to only select the most relevant features. I found how to add it following this link, and added it to the weights of the first layer (my FCNN is 4 layers deep). However, when I manually check the weights all of them are now super small (1e-4) and none of them are zero as I expected (that's why I used L1, instead of L2).

So my question is, should I add L1 regularization to all the weights in the model or only to the first layer's weights?

(extra questions) If it is the former, should I add the regularization to all the weights, even the network is very large (say with more layers)?

Thanks.

Topic regularization neural-network machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.