Anomaly Detection over multivariate data containing Nominal and numerical predictors

I am trying to implement Anomaly Detection over a multivariate dataset having nominal and numerical predictors.

Dataset has following pattern:

If we consider the below sample records, category_id, currency, and product_id are nominal predictors, whereas price is a numerical variable.

My model is able to identify the anomaly in the price for '_id=4' because the price range for different products for the particular combination of category_id-currency-product_id is between 10-500EUR.

But it is not able to identify anomalies for product_id=1, product_id=2 or product_id=3 as they have unusual category_id or currency or product_id.

Correlation matrix for dataset:

I am applying 'Multiple Correspondance Analysis(MCA)' over nominal predictors and reducing these 3 predictors to 2 predictors.

Algorithm: I have tried with OC-SVM, Isolation Forest, and Autoencoders, but none perform as expected.

Some of the resources that I have checked but to no help: Unsupervised Anomaly Detection with Mixed Numeric and Categorical Data

Anomaly Detection/Novelty detection

I am not able to understand where I am making the mistake: Data pre-processing OR Algorithm OR something else?

I would highly appreciate any help here.

You can review my code here.

Topic isolation-forest autoencoder anomaly-detection correlation data-cleaning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.