GMM in speech recoginition using HMM-GMM

Question

GMM in speech recoginition using HMM-GMM

Naveen Gabriel

2022年3月13日 15:00

I am trying to solve/understand ASR using HMM-GMM.

At the abstract level i do understand what's happening but I did not understand how GMM fits into it.

My data has 5K hours of speech from single user. I took the above picture from this article.

I do know what is GMM but i am unable to wrap my head around it. Can somebody explain with a simple example.

Topic markov-hidden-model speech-to-text gaussian nlp

Category Data Science

Naveen Gabriel · Accepted Answer · 2020年2月14日 09:51

The previous answer was wrong so I removed it.

Here goes my second attempt after reading Speech and Language processing by daniel Jurafsky and James H Martin(good book to read).

The 39 features associated with an observation/acoustic is considered to have come from mixtures of multivariate gaussian.

Why Mixture of MV gaussian ? Assuming a single MV gaussian for each state(phones) is a strong assumption which might not be true.

How does HMM comes into picture with GMM in ASR: Consider an uni-variate case where a single cepstral feature(usually it is 39) is represented by a single gaussian and HMM state has a mean value and variance which generate the particular observation. To get which observation was produced by which state is a part of decoding problem.

Let me know if this is right ?

GMM in speech recoginition using HMM-GMM

About