Cross-Validation for Unsupervised Anomaly Detection with Isolation Forest

I am wondering whether I can perform any kind of Cross-Validation or GridSearchCV for unsupervised learning. The thing is that I have the ground truth labels (but since it is unsupervised I just drop them for training and then reuse them for measuring accuracy, auc, aucpr, f1-score over the test set).

Is there any way to do this?

Topic isolation-forest unsupervised-learning cross-validation machine-learning

Category Data Science


Yes - you can use scikit-learn's GridSearchCV with an unsupervised algorithm. Since scikit-learn's Isolation Forest does not have a score function, a custom scoring function has to be implemented. It would something like this:

import numpy as np
from sklearn.ensemble        import IsolationForest
from sklearn.model_selection import GridSearchCV

def scorer(estimator, X): 
    "Custom scoring function"
    return np.mean(estimator.score_samples(X))

isolation_forest = GridSearchCV(IsolationForest(), scoring=scorer)
isolation_forest.fit(X)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.