Can Adagrad or Adam be used in loss function with l1-norm regularization?
there is one question for me. I want to know that how Adam or Adagrad treat l1-norm regularization in loss-function? (e.g. Lasso) I know that l1-norm is not differentiable function at zero but we can define subgradient for this function. I am eager to know that whether Adam optimizer utilize subgradient in this condition or not.
As far as I know, Adam Optimizer utilize Adagrad benefits and Adagrad is stochastic subgradient method. So, can we conclude that Adam can work well in optimizing loss function with l1 regularization?
If Adam and Adagrad cannot work well, are there any pytorch optimizers to attack the problem?
Topic lasso pytorch deep-learning optimization machine-learning
Category Data Science