What can help decrease outliers' influence on non-tree models?

I have a feature with all the values between 0 and 1 except few outliers larger than 1. I am trying to collect all the methods that can help to decrease outliers' influence on non-tree models:

  • StandardScaler
  • Apply rank transform to the features
  • Apply np.log1p(x) transform to the data
  • MinMaxScaler
  • Winsorization

I wasn't able to imagine any other ... I guess that's all?

Topic preprocessing ranking outlier

Category Data Science


Here are a couple of other options:

  • Set a threshold and remove all values larger than the threshold.

  • Apply RobustScaler which removes the median and scales the data according to the quantile range.

  • Apply QuantileTransformer which transforms the feature to follow a uniform or a normal distribution.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.