How to deal with data having 0 values in many columns?

I am trying to implement logistic regression but the dataset that I have have many columns with skewed data and most of them have 0 as values. I also the skewness of data for many columns its going above 190.

But it's not only for training data, it's the same for testing data too. I tried using log method to remove skewness but because most of the value is 0 it messed up my data. I don't know how to deal with it.

I already use standarization, improved only a bit. If someone has any idea please do suggest.

Topic normalization preprocessing dataset data-cleaning

Category Data Science


If you want to fix skewness the better alternative to a simple log transform is a Power Transformation. I think Box-Cox will not work with zeros, since it accepts only positive values, but Yeo-Johnson will.

If you have a lot of zeros it might be a good idea to check for zero variance if your data is continuous and near-zero variance if your data is discrete, than delete any uninformative features.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.