Should outliers be removed only from the target variable or from any variable where they are found?
What I often do is that I check boxplots and histograms for target/dependent variable and after much caution, treat/remove the outliers. But this is what I do only for the target variable. I.e., if considered the removal, I'd simply drop the entire row where my target value was found outlying.
Suppose if I am having outliers in some independent variables as well. What should I do there?
Either,
- Should I ignore them?
Or,
- Should I take the same approach with Independent variables as I took with the target variable?
EDIT:
Take the following example. Assume that we are predicting the expenditure of customers target_expenditure_USD
. Other variables are Independent Variables
age | sex | last_purchase | target_expenditure_USD |
---|---|---|---|
34 | M | 12-02-2020 | 520,000 |
24 | F | 02-06-2019 | 2,234 |
43 | F | 10-08-2018 | 4,365 |
130 | M | 23-07-2020 | 1,424 |
45 | F | 12-01-1839 | 6,453 |
Thanks
Topic feature-scaling outlier statistics data-cleaning machine-learning
Category Data Science