Should outliers be removed only from the target variable or from any variable where they are found?

What I often do is that I check boxplots and histograms for target/dependent variable and after much caution, treat/remove the outliers. But this is what I do only for the target variable. I.e., if considered the removal, I'd simply drop the entire row where my target value was found outlying.

Suppose if I am having outliers in some independent variables as well. What should I do there?

Either,

  1. Should I ignore them?

Or,

  1. Should I take the same approach with Independent variables as I took with the target variable?

EDIT: Take the following example. Assume that we are predicting the expenditure of customers target_expenditure_USD. Other variables are Independent Variables

age sex last_purchase target_expenditure_USD
34 M 12-02-2020 520,000
24 F 02-06-2019 2,234
43 F 10-08-2018 4,365
130 M 23-07-2020 1,424
45 F 12-01-1839 6,453

Thanks

Topic feature-scaling outlier statistics data-cleaning machine-learning

Category Data Science


Continuing from the comments.

You should inspect all variables for outliers, not just your dependent variable (y). And if you find any outliers then you should do something about it.

If you are certain that they are in fact erroneous measurements then ideally you would drop the whole row. If, however, you cannot determine that (and it doesn't look like it) then you shouldn't just drop them or change them, but rather it would be better to keep your data as-is, maybe mention the weird values, and use robust models when analyzing your data, that is models which are robust to outliers.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.