Labels as features in anomaly detection

I have a dataset born to solve a classification problem. Due to the imbalances of the Y, i choose to move to an anomaly detection task. Should I use the Y i have inside the anomaly detection model as a features? Is it an overfitting Risk?

Topic multiclass-classification anomaly-detection class-imbalance

Category Data Science


It simply depends on what is the goal of the task:

  • If the final goal is still to predict Y after detecting anomalies (i.e. probably using the output of anomaly detection as a feature), then Y cannot be used since it wouldn't be available in a realistic test set.
  • If it's just a completely different task in which Y is available as an input, then why not use it.

With 500k instances, A single additional variable with 3 possible values has an extremely low risk of causing overfitting.

Note that since classification didn't work, it's likely that there is little relationship between the features and Y (otherwise there was some mistake in the classification experiment).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.