Anomaly Detection

I have a problem where I want to identify Vendors with unusual high amount invoices. What would be the best way to identify such invoices?

I am trying to use Isolation Forest but having trouble in grouping by the result by Vendor.

Any help will be appreciated.

Data is in below format .

Vendor ID      Amount 
1                456
2                1000
1                489
3                 896
2                4576

Topic isolation-forest anomaly-detection outlier machine-learning

Category Data Science


This is a pretty simple example and I would not rely on ANY automatic detection algorithm until I manually looked at this or historical data and labelled data points as "unusual" according to some business definition. Some of the data points outside the norm may in fact be valid. Based on your example, you just do not have enough historical and additional multivariate data to make a determination.


Since the dataset has only a single dimension, I believe you can apply the simple Outlier detection technique for each vendor.

  • The quantile method
  • The MAD method [Read Here ]

If you want a single model,
Then define the vendor-wise standard deviation as feature and then apply the above method.

e.g. for { V1:[100, 120, 15000], V2:[15000, 16000, 14000] }
Feature value will be - 7019, 816

import numpy as np
arr_1 = np.array([100, 120, 15000]) 
arr_2 = np.array([15000, 16000, 14000])

arr_1.std(),arr_2.std()

Output - (7019.218063447112, 816.496580927726)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.