How to set the parameters of a Hidden Markov Model that'll be used to correct the mistakes by a previous classifier?
Say we've previously used a neural network or some other classifier C with $N$ training samples $I:=\{I_1,...I_N\}$ (that has a sequence or context, but is ignored by C) the, belonging to $K$ classes. Assume, for some reason (probably some training problem or declaring classes), C is confused and doesn't perform well. The way we assign a class using C to each test data $I$ is: $class(I):= arg max _{ {1 \leq j \leq K} } p_j(I)$, where $p_j(I)$ is the probability estimate of $I$ corresponding to the $j$-th class, given by C.
Now, on top of this previous classifier C, I'd like to use a Hidden Markov Model (HMM) to "correct" the mistakes made by the previous context-free classifier C, by taking into account the contextual/sequential information not used by C.
Hence let in my HMM, the hidden state $Z_i$ denote the true class of the $i$-th sample $I_i$, and $X_i$ be the predicted class by C. My question is: how could we use the probabilistic information $cl(I):= arg max _{ {1 \leq j \leq K} } p_j(I)$ to train this HMM? I understand that the confusion matrix of C can be used to define the emission prob. of the HMM, but how do we define the transition and start/prior prob.? I'm tempted to define the start/prior prob. vector as $\pi:=(p_1(x_1), ..., p_K(x_1))$. But I may be wrong. This is my main question.
A follow up question: One can define an HMM in the above way (using confusion matrix and the prob. information from C); call the resulting parameter set $\Theta_0$. But after doing so, is it advisable to estimate the parameters to best fit the data $I$ used for C, while initializing a parameter set with the values mentioned in the previous paragraph?
Topic markov-hidden-model markov-process neural-network machine-learning
Category Data Science