Feature selection before or after scaling and splitting

Question

Feature selection before or after scaling and splitting

Caterina

2022年5月2日 15:30

Should feature scaling/standardization/normalization be done before or after feature selection, and before or after data splitting?

I am confused about the order in which the various pre-processing steps should be done

Topic machine-learning-model preprocessing feature-scaling feature-selection

Category Data Science

Ben Reiniger · Accepted Answer · 2022年5月2日 15:30

Some feature selection methods will depend on the scale of the data, in which case it seems best to scale beforehand. Other methods won't depend on the scale, in which case it doesn't matter.

All preprocessing should be done after the test split. There are some cases where it won't make a difference, but if you're uncertain it's safer to do everything after splitting. The test set is supposed to act as data your model will see in production; you won't have access to that data to help define scale (or anything else), so don't use it that way while training.

Feature selection before or after scaling and splitting

About