Anomaly Detection over multivariate data containing Nominal and numerical predictors
I am trying to implement Anomaly Detection over a multivariate dataset having nominal and numerical predictors.
Dataset has following pattern:
If we consider the below sample records, category_id, currency, and product_id are nominal predictors, whereas price is a numerical variable.
My model is able to identify the anomaly in the price for '_id=4' because the price range for different products for the particular combination of category_id-currency-product_id is between 10-500EUR.
But it is not able to identify anomalies for product_id=1, product_id=2 or product_id=3 as they have unusual category_id or currency or product_id.
Correlation matrix for dataset:
I am applying 'Multiple Correspondance Analysis(MCA)' over nominal predictors and reducing these 3 predictors to 2 predictors.
Algorithm: I have tried with OC-SVM, Isolation Forest, and Autoencoders, but none perform as expected.
Some of the resources that I have checked but to no help: Unsupervised Anomaly Detection with Mixed Numeric and Categorical Data
Anomaly Detection/Novelty detection
I am not able to understand where I am making the mistake: Data pre-processing OR Algorithm OR something else?
I would highly appreciate any help here.
Topic isolation-forest autoencoder anomaly-detection correlation data-cleaning
Category Data Science