GMM in speech recoginition using HMM-GMM
I am trying to solve/understand ASR using HMM-GMM.
At the abstract level i do understand what's happening but I did not
understand how GMM fits into it.
My data has 5K hours of speech from single user. I took the above picture from this article.
I do know what is GMM but i am unable to wrap my head around it. Can somebody explain with a simple example.
Topic markov-hidden-model speech-to-text gaussian nlp
Category Data Science