Semi-supervised anomaly detection
I am currently exploring anomaly detection methods for my work and, basically I have gone through Local Oulier Factor and Isolation Forests, both unsupervised methods.
Now, the thing is, there might be a chance that I do not want a point that is far away to considered as an outlier, and so I would need some sort of supervised or semi supervised method for the outlier detection.
So what I am thinking is:
1.Label a bunch of points as outlier using LOF/IF.
2.Train a classifier on top of the labels, and then make manual adjustements if needed.
Is this what is considered a semi-supervised method? Does anybody have any experience with this sort of problem that could say if I am missing something here?
Also, because I am labeling outliers the dataset will be very unbalanced. My idea is to use bagging for this. Let's say my dataset is 1% outliers, I would train 100 equally proportional models (the outliers parts remains the same on each model, but the normal points change until I go over the entirety of the dataset) and then the final prediction is a vote of all the models. Is this stupid or a good idea?
Topic anomaly-detection semi-supervised-learning outlier class-imbalance
Category Data Science