How to handle fixed values for variables in pre-processing

Question

How to handle fixed values for variables in pre-processing

Sm1

2020年9月13日 00:31

I have a dataset which contains few variables whose values do not change. Some of the variables are non-numeric (for example all values for that variable contain the value 5) and few variables are real-valued but all same values. When doing standardization of the variables so that each is a zero mean and variance 1, these variables give NaN values. Therefore, is it ok to exclude such variables (irrespective of being categorical or real-valued) that contain constant values from the normalization/standardization step? These variables are important as features hence I cannot delete them. Is there any other way to handle such variables?

Topic dummy-variables data-science-model preprocessing

Category Data Science

fswings · Accepted Answer · 2020年9月13日 00:31

By definition, if these columns or features contain a constant value and yet the output variables change, then they are not influencing the output and likely can be ignored.

A more formal test is to determine how much of the variance between a model that uses that feature is attributable to that feature.

A simple example to illustrate this principle is to look up examples of PCA. In those examples, the technique tries and identifies feature that drive the most variance.

How to handle fixed values for variables in pre-processing

About