Custom Loss Function for Mixing Sparse and Dense Features for a Prediction Problem

I have a largely uncorrelated feature space of about 40 dichotomous features, using which I'm trying to predict a continuous target variable.

Now, some of these features are very sparse (Active less than 10% of the time, with the rest as zeros). But the few times that these features are active may be really good predictors of the target. In most algorithms, these features will be mostly ignored due to how sparse they are - despite their predictive ability.

What are the ML algorithms/techniques I can use to remedy this and is there a possibility of customizing the loss function in order to give these features their due weight?

P.S. Are there any Bayesian approaches that would be more immediately reactive to these features? Would smoothing the data make any sense? And would embedding be of any use (Although, I only have 2000 training data points so using any deep learning models may not be appropriate)?

Topic sparse bayesian loss-function predictive-modeling machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.