Detecting Data Drift in Audio Data
For a give set of audio files collected from an industrial process via a microphone, I have extracted suitable features and fed them into a neural network for training a binary classifier as depicted below.
The model has been performing quite well on an unseen data. I am at the stage of developing a sub-product to monitor data drift forecasting the inevitable i.e. data changes (namely microphone position changes, product materials changes and produces a distinct signal, background noise prevail and influences etc.). The obvious reason is that the classification may degrade due to the drift and consequently misclassification increases. Such drifts in data must be identified.
Question is, what is the best way to monitor data drift in acoustic data? I have some ideas:
- Monitor model's predictions (concept drift). If it changes abruptly, then I trigger an alarm! Simple and effective, isn't it?
- Focus on feature drift. Compare statistical characteristics or distributions of extracted feature for the training data and compare that with a batch of real-time data? Needs more effort, but I think it is doable.
Appreciate sharing your thoughts on best practices. For example if ideas used in identifying drifts in time series can be used, and if yes, would you please share some references to papers or implementations?
Happy Catching All Sorts of Drifts!
Topic data-drift concept-drift audio-recognition time-series machine-learning
Category Data Science