Trim left tail of music in audio file
I have audio files, most of them start with the same music, and then a conversation begins. I want to trim the part of the music (which can be varied in length). I have no labels, I can transcribe the whole file using off-the-shelf models, but the music itself contains words which are resulted in false positives. but I know to extract features from the audio, such as Mel spectrogram, pitch, etc. The music at the beginning of the file can easily be noticed by looking at the spectrogram or just at the sound wave (please see the following images).
I thought about using a knn with a high number of neighbors, and then filtering the audio based on its values. Is there a more obvious way?
Thanks!
Topic k-nn audio-recognition unsupervised-learning
Category Data Science