How to determine the abnormality of a specific variable by taking into account all the other variables in the data?
I have an issue of machine learning/anomaly detection. Indeed, I have a variable Y and several other variables X. The purpose is to quantify the degree of abnormality of the data on Y but I have to take into account the values on the other variables (the relationship between Y and X).
Normally, an anomaly detection algorithm would find anomalies but on the whole data (Y + X), but in my case I want to zoom in on Y because it is a very important variable. If I wanted to quantity the abnormality on all my variables (Y + X), Y would be lost in the middle of all the variables.
It is not something strange because when you apply a linear regression Y ~ X, you can calculate the Cook distance which is a kind of abnormality score and it took into account the relationship between Y and X.
I hope it is clear!
Topic anomaly anomaly-detection research machine-learning
Category Data Science