Should one log transform discrete numerical variables?
I am working on a Linear Regression problem and one of the assumptions of a Linear Regression model is that the features should be Normally Distributed. Hence to convert my non linear features to linear, I am performing several transformations like log, box-cox, square-root transformation etc. I have both, discrete and continuous numerical variables (an example of each along with their histograms and qq plot is given):
CONTINUOUS VARIABLE HISTOGRAM AND QQ PLOT
DISCRETE VARIABLE HISTOGRAM AND QQ PLOT
From the qq plot of the continuous variable, we can see there are points that do no lie on the red line and hence it needs some kind of transformation. So I might try different transformations to see which results in a Normal Distribution and hence make the points fall on the red line.
But what about the discrete variable? From the qq plot of the discrete variable, all the points are forming a horizontal line so will transforming them make them fall on the red line? Should I proceed the same as I do in the case of a continuous variable, or is there some other method?
Topic transformation feature-engineering linear-regression python
Category Data Science