massively imbalanced data

I am dealing with time series data with +200K (every minute for 6 months)record of gas turbine I am trying to early detect the fault (0 or 1-fault). The issues with the data are: 1.the fault occurred only 5 times (by observing the sudden shutdown). make the data hugely imbalanced. 2.(unsupervised) No binary output. I used 2 of the variables as my output and used them for binary clustering (kmeans) but the result not very good as there are false ones. 3.I made binary classification manually by my own observation since there are only 5 faults. 4. Prediction accuracy very high 99.9 but for sure the confusion matrix shows otherwise with many false 1s. 5.I tried both random over sampling and SMOTE. Only the ensembles algorithms works very well.

I don't know if my whole approach is wrong.

using python

Topic data-science-model prediction unsupervised-learning machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.