What's the best way to validate a rare event detection model during training?

Question

What's the best way to validate a rare event detection model during training?

jack

2022年5月5日 14:01

When training a deep model for rare event detection (e.g. sound of an alarm in a home device audio stream), is it best to use a balanced validation set (50% alarm, 50% normal) to determine early stopping etc., or a validation set representative of reality? If an unbalanced, realistic validation set is used it may have to be huge to contain only a few positive event examples, so I'm wondering how this is typically dealt with.

In the given example of alarm sound detection, false negatives are obviously costly, but I imagine false positives still have an equal cost, because the event is so rare in reality that even a very low false positive rate could still correspond to low precision. Also, to me anomaly detection doesn't seem very applicable in this example because of the open set nature of the problem, where the "normal state" of the audio stream isn't clearly defined (i.e. there could be many unforeseen noises/sounds aside from alarms).

If anyone has insight in this area I'd greatly appreciate it!

Topic audio-recognition anomaly-detection class-imbalance deep-learning

Category Data Science

Brian Spiering · Accepted Answer · 2021年11月22日 14:05

1

Brian Spiering answered at 2021年11月22日 14:05

That is an empirical question that could be answered through hold-out datasets. Create the different scenarios and see which one the model performs better in.

What's the best way to validate a rare event detection model during training?

About