Isolation Forest height limit absent in SkLearn implementation

In the original publication of the Isolation Forest algorithm, the authors mention a height limit parameter to control the granularity of the algorithm. I did not find that explicit parameter on the Sklearn implementation of the algorithm, and I was wondering whether it is possible to control granularity in some other way?

Topic decision-trees outlier scikit-learn

Category Data Science


In "Isolation Forest" paper, it is said:

enter image description here

I think that is why you don't find it in the scikit-learn.


Unfortunately, it seems like there is no hlim parameter incorporated into sklearn.ensemble.IsolationForest. Calculation of anomaly score is just based on the depth each point settles to and by the average path length. The only way to tune in a bit is by using contamination which calculates the threshold needed to set for anomaly score.

To achieve the granularity that was given in the original paper using hlim=6 to detect a cluster of small points, using a lot of estimators may solve the problem (still depends heavily on how you sample data from smaller cluster into a lot of estimators). But if that small cluster of data is very less in number, I don't think this idea works and there is nothing much we can do from the current implementation in sklearn. Hope this helps.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.