If $\ell_0$ regularization can be done via the proximal operator, why are people still using LASSO?

I have just learned that a general framework in constrained optimization is called "proximal gradient optimization". It is interesting that the $\ell_0$ "norm" is also associated with a proximal operator. Hence, one can apply iterative hard thresholding algorithm to get the sparse solution of the following

$$\min \Vert Y-X\beta\Vert_F + \lambda \vert \beta \vert_0$$

If so, why people are still using $\ell_1$? If you can just get the result by non-convex optimization directly, why are people still using LASSO?

I want to know what's the downside of the proximal gradient approach for $\ell_0$ minimization. Is it because of the non-convexity and randomness associated with? That means the initial estimator is very important.

Topic sparsity optimization

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.