Sensitivity analysis in outlier explanation

I am trying to find the outlier explanation using the sensitivity analysis. Let’s consider that my dataset contains 19 different input values and 1 output value (So overall 20 different columns are there and values are numerical). I have already made a prediction model and I am considering the values with high prediction errors are outliers/ anomalies. I have done the sensitivity analysis for individual input values but in the dataset values are correlated with some other input values, e.g. value 1 is correlated with value 3,4,7; value 2 is correlated with 5,10,18 etc.

For outlier explanation, first I am checking if input values also contain any outlying inputs, if there are some then using sensitivity check I want to find if the values are more sensitive to the output value. Because the values are correlated with other inputs so individual sensitivity analysis does not make much sense, but in the end I want to find the most influential group of input values that makes the outlying value to normal. After that I will verify if it is valid for similar outliers and then I would provide the group as an explanation for the outlier. So my confusion is how I can check sensitivity of a group using in this case? If this approach does not sound logical, please let me know where does it sound confusing?

Topic outlier python bigdata data-mining

Category Data Science


You can use Robust Squared Mahalanobis Distance to detect outliers in Multivariate. Then run your model one time using all data values and compute the Mean square error. Run it for the second time without these outliers and compute again MSE. See the difference. If you have so many outliers, you can use first principle component (PC) to reduce the dimensionality of the data set and run your model again. You can omit the data values whose corresponding residuals greater than 2 from the data set, and then run the model again. You may need the following link to know how to detect outliers:

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Cabana+Garceran+del+Vall%2C+E.%2C+Laniado+Rodas%2C+H.%2C+%24%5C%26%24+Lillo+Rodr%C3%ADguez%2C+R.+E.+%282017%29.+Multivariate+outlier+detection+based+on+a+robust+Mahalanobis+distance+with+shrinkage+estimators.&btnG=

Mubarak Al-Shukeili my research interst: Multivariate Analysis. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Al-Shukeili+Mubarak&btnG=

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.