Confusion on Outliers

Question

Confusion on Outliers

exp post

2022年6月1日 03:08

I am not able to distinguish outliers: when to go with the std. dev. or when we need to go with the median.

My understanding on std. dev. is: if the data point is away from the mean by more than 2 std. dev., we consider that as an outlier. Similarly for the median, we say that any data point that is not in-between Q1 and Q3 is an outlier.

So I am confused as to which one to choose.

Can you guys help me understand?

Topic outlier statistics machine-learning

Category Data Science

OmG · Accepted Answer · 2019年11月19日 17:09

It completely depends on the context of the data that is being considered. For example, $2\sigma$ from the mean ($\mu$), depends on the distribution of the data. What is the value of $\mathbb{P}(-2\sigma < X - \mu <2\sigma)$.

Also, there are many methods for outlier detection, and all of them depend on the context. Hence, you cannot say which method should be used by taking it outside the context. You should do some experiments by these methods over the data, and then base on the real outliers sample, decide which method is proper for the current data.

Confusion on Outliers

About