Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
Probability Theory and Stochastic Processes
Steven R. Dunbar
Notation and Problems of Hidden Markov Models
Mathematically Mature: may contain mathematics beyond calculus with proofs.
What is meant by an ergodic Markov chain?
Alternative names for Hidden Markov Models, especially in communications engineering and signal processing are Markov sources or probabilistic functions of Markov chains, emphasizing the emission of observation signals from the states.
Example. The “urn-and-ball” model is a standard example of the previous elements. The model has urns, each ﬁlled with a large number of colored balls. Each ball is one of possible colors. The urn-and-ball model generates an observation sequence by choosing one of the urns, according to an initial probability distribution, selecting a ball from the initial urn, recording the color, replacing the ball, and then choosing a new urn according to transition probability distribution associated with the current urn.
A Hidden Markov Model is a double stochastic process with an underlying stochastic process that is not observable but can only be glimpsed through another stochastic process that produces the set of observations. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an Hidden Markov Model gives some information about the sequence of states.
The usual notation for a discrete-time discrete-observation Hidden Markov Model is
Remark. The choice of indexing beginning at is arbitrary. It is chosen here to conform with Stamp . The choice of indexing beginning at is consistent with the initial value of dynamical systems at time . It is also convenient for programming languages such as python, perl and C that begin indexing at .
Using the Model, the process for generating an observation sequence is:
The independence assumptions in this notation are
The matrix is an row stochastic matrix with probabilities independent of , so we are assuming a stationary Markov process. The interpretation of is
The standard notation in Hidden Markov Models for the matrix is
As with , the matrix is row stochastic and the probabilities are independent of . The triple deﬁnes an Hidden Markov Model.
Although is not a Markov chain (explain why!) conditional on the current state , the sequence of future signals and states is independent of the sequence .
As an example, for an example state sequence of length
with corresponding observations
let be the probability of starting in state , is the probability of initially observing and is the probability of transitioning from state to state . Continuing and multiplying the conditional probabilities, the probability of this state sequence with these observations is
Problem 1 is the evaluation problem: given a model and a sequence of observations, how can we compute the probability that the observed the model produces the sequence. We can also view the problem as: how do we “score” or evaluate the model. If we think of the case in which we have several competing models, the solutions of problem 1 allows us to choose the model that best matches the observations.
Problem 2 is the estimation problem or discovery problem in which we attempt to uncover the hidden state sequence. We usually use an optimality criterion to discriminate which sequence best matches the observations. Two optimality criteria are common, and so the choice of criterion is a strong inﬂuence on the revealed state sequence. See the next section for speciﬁc examples of the two optimality criteria.
The solution of Problem 3 attempts to optimize the model parameters so as best to describe how the observed sequence comes about. The observed sequence used to solve Problem 3 is a training sequence since it “trains” the model. This training problem is the crucial one for most applications of hidden Markov models since it creates best models for real phenomena.
Rabiner  credits Jack Ferguson of the Institute for Defense Analysis with introducing the three fundamental problems for Hidden Markov Models.
Remark. The notation is standard and traditional in the theory and application of Hidden Markov Models. It is similar to notation occurring in statistics with parametric tests, indicating that that the model depends on the values of the parameters. The notation is not strictly related to conditional probabilities in the mathematical probability sense. A mathematician’s typical notation would indicate the parameter dependence with subscripts, so we would write instead. However, the notation is so strongly embedded in the literature of Hidden Markov Models that I will continue to use it, except occasionally in proofs where mathematical conditional probabilities do occur and the notation would mix meanings.
Consider the variable factory example from the previous section. For brevity, represent the factory good state with and the bad state with . Suppose that for some particular three periods, the observation of factory outputs is . We can exhaustively compute the probabilities of this observation sequence from various state sequences, see Table 1. We want to determine the most likely state sequence of the Markov process over these three periods. We could reasonably deﬁne “most likely” as the state sequence with the highest probability from among all possible state sequences of length three. Later we will use Dynamic Programming to eﬃciently ﬁnd this particular sequence. On the other hand, we might reasonably deﬁne “most likely” as the state sequence that maximizes the expected number of correct states. In the next section, we will use the Viterbi Algorithm associated with Hidden Markov Models to ﬁnd this sequence. The Dynamic Programming and Viterbi Algorithm solutions are not necessarily the same.
As a single example, compute the probability of starting in state , expressing observation , transitioning from state to state , expressing observation , transitioning to state , and ﬁnally expressing observation .
Similarly, we can directly compute the probability of each of the remaining possible state sequences of length . The results are in Table 1 with the probabilities in the last column normalized by the total probability of observing so they sum to . To ﬁnd the optimal state sequence in the Dynamic Programming sense, choose the sequence with the highest probability, namely .
To ﬁnd the optimal sequence in the Hidden Markov Model sense, we choose the most probable symbol at each position. To this end, we sum the probabilities in Table 1 that have a in the ﬁrst position. Doing so, we ﬁnd the normalized probability of in the ﬁrst position is and hence the probability of a in the ﬁrst position is the complementary probability . The Hidden Markov Model approach therefore chooses the ﬁrst element of the sequence to be . Repeat this calculation for each time step of the sequence, obtaining the probabilities in Table 2. From Table 2, the optimal sequence in the Hidden Markov Model sense is . The optimal Dynamic Programming solution diﬀers from the optimal Hidden Markov Model sequence.
This section is adapted from: “A Revealing Introduction to Hidden Markov Models” by Mark Stamp, December 11, 2015; and “An Introduction to Hidden Markov Models”, by L. R. Rabiner and B. H. Juang. The calculations for the Variable Factory are extended from Sheldon M. Ross, Introduction to Probability Models, Section 4.11, pages 256-262, Academic Press, 2006, 9th Edition.
 Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(3):257–286, February 1989. hidden Markov model, speech recognition.
I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable eﬀort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.
I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reﬂects the thoughts, interests and opinions of its author. They do not explicitly represent oﬃcial positions or policies of my employer.
Information on this website is subject to change without notice.
Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1
Email to Steve Dunbar, sdunbar1 at unl dot edu
Last modiﬁed: Processed from LATEX source on April 14, 2017