If $\ell_0$ regularization can be done via the proximal operator, why are people still using LASSO?
I have just learned that a general framework in constrained optimization is called "proximal gradient optimization". It is interesting that the $\ell_0$ "norm" is also associated with a proximal operator. Hence, one can apply iterative hard thresholding algorithm to get the sparse solution of the following
$$\min \Vert Y-X\beta\Vert_F + \lambda \vert \beta \vert_0$$
If so, why people are still using $\ell_1$? If you can just get the result by non-convex optimization directly, why are people still using LASSO?
I want to know what's the downside of the proximal gradient approach for $\ell_0$ minimization. Is it because of the non-convexity and randomness associated with? That means the initial estimator is very important.
Topic sparsity optimization
Category Data Science