Trim left tail of music in audio file

I have audio files, most of them start with the same music, and then a conversation begins. I want to trim the part of the music (which can be varied in length). I have no labels, I can transcribe the whole file using off-the-shelf models, but the music itself contains words which are resulted in false positives. but I know to extract features from the audio, such as Mel spectrogram, pitch, etc. The music at the beginning of the file can easily be noticed by looking at the spectrogram or just at the sound wave (please see the following images).

I thought about using a knn with a high number of neighbors, and then filtering the audio based on its values. Is there a more obvious way?

Thanks!

Topic k-nn audio-recognition unsupervised-learning

Category Data Science


Eventually, since the data includes only phone calls, I have noticed that there is a "BIP" that separates the conversation from the music at the beginning. So I convolved it over files and achieved better results than k-means and GMMs.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.