Outlier treatment

Question

Outlier treatment

spectre

2021年11月18日 12:37

I am working on a regression problem where I have a lot of outliers in multiple variables. As far as I can think of, there are 3 things I can do to outliers.

Remove them (least attractive option)
Transform them (log transformation, box-cox transformation etc)
Do nothing and build a model including them

My question is regarding the second point. If I want to transform my features using any of the transformations solely for the purpose of outlier, is it ok to do it?

Topic transformation feature-engineering outlier python

Category Data Science

Inuraghe · Accepted Answer · 2021年11月18日 12:37

Although it is the least attractive, the best solution is to eliminate them. Including outliers, even if modified, goes a long way in modifying your dataset. For example, if your goal is to build a Machine Learning model, using modified data falsifies the training of your model and therefore gives you an unreliable result.

The whole thing is summarized by the principle "garbage in, garbage out", or if you use garbage data as input you will get garbage results. Therefore the cleanliness of the data is very important, better less data than more but not very reliable data.

Outlier treatment

About