Isolation Forest height limit absent in SkLearn implementation

Question

Isolation Forest height limit absent in SkLearn implementation

jimijazz

2022年5月25日 19:36

In the original publication of the Isolation Forest algorithm, the authors mention a height limit parameter to control the granularity of the algorithm. I did not find that explicit parameter on the Sklearn implementation of the algorithm, and I was wondering whether it is possible to control granularity in some other way?

Topic decision-trees outlier scikit-learn

Category Data Science

8Simon8 · Accepted Answer · 2022年5月25日 19:36

1

8Simon8 answered at 2022年5月25日 19:36

In "Isolation Forest" paper, it is said:

I think that is why you don't find it in the scikit-learn.

Kiritee Gak · Accepted Answer · 2018年1月16日 01:26

Unfortunately, it seems like there is no hlim parameter incorporated into sklearn.ensemble.IsolationForest. Calculation of anomaly score is just based on the depth each point settles to and by the average path length. The only way to tune in a bit is by using contamination which calculates the threshold needed to set for anomaly score.

To achieve the granularity that was given in the original paper using hlim=6 to detect a cluster of small points, using a lot of estimators may solve the problem (still depends heavily on how you sample data from smaller cluster into a lot of estimators). But if that small cluster of data is very less in number, I don't think this idea works and there is nothing much we can do from the current implementation in sklearn. Hope this helps.

Isolation Forest height limit absent in SkLearn implementation

About