When Does Feature Selection Takes Place?

I have a dataset where there are categorical features as well as numeric features, and I have to perform OneHotEncoding, Normalization and feature selection on it.

In what order should I perform these steps on my data?

I am new to DataScience, please explain the logic behind it in Layman's terms too.

Thank you.

Topic data-engineering feature-engineering classification feature-selection machine-learning

Category Data Science


Normalization is done only for numerical variables and One Hot Encoding only for categorical variables.

I would advise split you data into 2 dataframes. One for numerical features and other contains only categorical features. Then perform Normalization and One Hot Encoding for respective datasets. That way you don't get confused about the order!

Feature Selection is usually done before Encoding and Scaling. This is done to remove the redundant features that might consume time while encoding/scaling. Also it is done to reduce dimensionality of the dataset when Encoding.


  • One hot encoding is only for categorical features
  • Normalization is only for numerical features

Thus these two steps can be done in any order, they are independent.

Feature selection should be done with the final set of features as used by the model, so it must be the last step.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.