Dummies Variables and Scaling in Regression Problems
I was wondering if having dummies variable and scaling other variables could joke my model. In particular, I have implemented a Random Forest Regressor by using scikit-learn, but my data model is composed by a set of dummies varibles and 2 numerical variables. I approached in this way:
- Convert categorical in dummies variables
- Separate the numerical variables
- Scale with Standard Scaler from scikit-learn the numerical variables (at point 2)
- Join the dummies and numerical
- Split train, test
- train the model
Would this approach create an inappropriate bias considering the different scale from dummies and the scaler numerical? Or, at least, is it correct?
Topic dummy-variables random-forest
Category Data Science