L1 regularization to first layer or all the layers
I have lots of features in the input to a Fully Connected Neural Network(FCNN) and was thinking to add L1 regularization to only select the most relevant features. I found how to add it following this link, and added it to the weights of the first layer (my FCNN is 4 layers deep). However, when I manually check the weights all of them are now super small (1e-4) and none of them are zero as I expected (that's why I used L1, instead of L2).
So my question is, should I add L1 regularization to all the weights in the model or only to the first layer's weights?
(extra questions) If it is the former, should I add the regularization to all the weights, even the network is very large (say with more layers)?
Thanks.
Topic regularization neural-network machine-learning
Category Data Science