Does feature normalization improve performance of Hidden Markov Models?
For training a Hidden Markov Model (HMM) on a multivariate, continuous time series, is it preferable to scale the data somehow? Some pre-processing steps may be:
- Normalize to 0-mean and unit-variance
- Scale to [-1, 1] interval
- Scale to [0, 1] interval
With neural networks, the rationale behind scaling is to get an "un-squished" error surface that is easier to navigate in.
HMMs use the Baum-Welch algorithm, which is a variation on the Expectation Maximization (EM) algorithm, to learn parameters.
Is EM sensitive to scale of features? Is there some motivation for normalization for HMMs?