Labels as features in anomaly detection

Question

Labels as features in anomaly detection

Daniele

2022年5月31日 01:04

I have a dataset born to solve a classification problem. Due to the imbalances of the Y, i choose to move to an anomaly detection task. Should I use the Y i have inside the anomaly detection model as a features? Is it an overfitting Risk?

Topic multiclass-classification anomaly-detection class-imbalance

Category Data Science

Erwan · Accepted Answer · 2020年12月1日 23:09

It simply depends on what is the goal of the task:

If the final goal is still to predict Y after detecting anomalies (i.e. probably using the output of anomaly detection as a feature), then Y cannot be used since it wouldn't be available in a realistic test set.
If it's just a completely different task in which Y is available as an input, then why not use it.

With 500k instances, A single additional variable with 3 possible values has an extremely low risk of causing overfitting.

Note that since classification didn't work, it's likely that there is little relationship between the features and Y (otherwise there was some mistake in the classification experiment).

Labels as features in anomaly detection

About