Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar

__________________________________________________________________________

Notation and Problems of Hidden Markov Models

_______________________________________________________________________

Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See http://mathjax.org for details on supported browsers, accessibility, copy-and-paste, and other features.

_______________________________________________________________________________________________

Rating

Rating

Mathematically Mature: may contain mathematics beyond calculus with proofs.

_______________________________________________________________________________________________

Section Starter Question

Section Starter Question

What is meant by an ergodic Markov chain?

_______________________________________________________________________________________________

Key Concepts

Key Concepts

  1. For Hidden Markov Models, the first problem is the evaluation problem: given a model and a sequence of observations, how can we compute the probability that the model produced the observed sequence? We can also view this problem as: how do we “score” or evaluate the model?
  2. Given the model λ = (A,B,π) and a sequence of observations 𝒪, find an optimal state sequence after defining “optimal” in some way for the model. In other words, we want to uncover the hidden part of the Hidden Markov Model.
  3. The solution of Problem 3 attempts to optimize the model parameters to best describe how the observed sequence comes about. The observed sequence used to solve Problem 3 is a training sequence since it “trains” the model. This training problem is the crucial one for most applications of hidden Markov models since it creates best models for real phenomena.
  4. We usually use an optimality criterion to discriminate which sequence best matches the observations. Two optimality criteria are common, and the choice of criterion determines the algorithm and the consequent revealed state sequence.

__________________________________________________________________________

Vocabulary

Vocabulary

  1. The evaluation problem: given a model and a sequence of observations, how can we compute the probability that the observed the model produces the sequence?
  2. The estimation problem or discovery problem is the attempt to uncover the hidden state sequence.
  3. The observed sequence used to solve Problem 3 is a training sequence since it “trains” the model.

__________________________________________________________________________

Mathematical Ideas

Mathematical Ideas

Elements of the Hidden Markov Model

  1. The model has a finite number, say N, of states. Each state emits one of several possible signals having measurable, distinctive properties. In many practical situations, the states have a physical significance.
  2. At each discrete clock time, t, the model enters a new state, or possibly remains in the current state. The entered state depends on a transition probability distribution based on the previous state through the Markov chain property, that is, independently of earlier Markov chain states and signals. Often the Markov chain is ergodic so that all states can connect to any other state. In some applications, other interconnections of states, e.g. increasing or left-to-right, are of interest.
  3. After each transition, the model emits one of M observation output symbols or signals according to a probability distribution depending on the current state. Each state has a fixed probability distribution for its signals regardless of how and when the chain enters the state. There are thus N such observation probability distributions. Sometimes we say M is the alphabet size. The observation symbols correspond to the physical output of the modeled system.

Alternative names for Hidden Markov Models, especially in communications engineering and signal processing are Markov sources or probabilistic functions of Markov chains, emphasizing the emission of observation signals from the states.

Example. The “urn-and-ball” model is a standard example of the previous elements. The model has N urns, each filled with a large number of colored balls. Each ball is one of M possible colors. The urn-and-ball model generates an observation sequence by choosing one of the N urns, according to an initial probability distribution, selecting a ball from the initial urn, recording the color, replacing the ball, and then choosing a new urn according to transition probability distribution associated with the current urn.

A Hidden Markov Model is a double stochastic process with an underlying stochastic process that is not observable but can only be glimpsed through another stochastic process that produces the set of observations. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an Hidden Markov Model gives some information about the sequence of states.

The usual notation for a discrete-time discrete-observation Hidden Markov Model is

T =  length of observation sequence, N =  number of states in the model, M =  number of observation symbols, Q = q0,q1,,qN1 =  distinct states of the Markov process, X = x0,x1,,xT1 =  sequence of states visited by the Markov chain, V = 0, 1, 2,,M 1 =  set of possible observations, A = (aij), where aij = qj|qi  is the state transition probability matrix, B = (bij) = (bi(j)), where bij = j|qi  is the observation transition probability matrix, π = πj =  initial state distribution at time 0, 𝒪 = (𝒪0,𝒪1,𝒪T1) =  the observation sequence.

Remark. The choice of indexing beginning at 0 is arbitrary. It is chosen here to conform with Stamp [4]. The choice of indexing beginning at 0 is consistent with the initial value of dynamical systems at time 0. It is also convenient for programming languages such as python, perl and C that begin indexing at 0.

Using the Model, the process for generating an observation sequence 𝒪 = (𝒪0,𝒪1,𝒪T1) is:

  1. Set t = 0.
  2. Choose an initial state xt q0,qN1 according to the initial state distribution π.
  3. Choose 𝒪j according to bi(j), the symbol probability distribution in state i = xt.
  4. Set t = t + 1.
  5. Choose a new state xt q0,qN1 according to the probability transition matrix A.
  6. Return to step 3 if t < T; otherwise terminate the process.

Schematic diagram of a Hidden Markov Model.

Figure 1: Schematic diagram of a Hidden Markov Model

The independence assumptions in this notation are

𝒪n = j|xt = qi = bi(j) 𝒪n = j|x0,𝒪0,,xn1,𝒪n1,xn = qi = bi(j)

The matrix A is an n × n row stochastic matrix with probabilities aij independent of t, so we are assuming a stationary Markov process. The interpretation of A is

aij =  state qj at t + 1 |state qi at t .

The standard notation in Hidden Markov Models for the N × M matrix B is

bi(j) =  observation j at t |state qi at t .

As with A, the matrix B is row stochastic and the probabilities are independent of t. The triple λ = (A,B,π) defines an Hidden Markov Model.

Although 𝒪 = (𝒪0,𝒪1,𝒪2,,𝒪n1) is not a Markov chain (explain why!) conditional on the current state 𝒪n, the sequence 𝒪n,xn+1,𝒪n+1, of future signals and states is independent of the sequence xo,𝒪0,,xn1,𝒪n1.

As an example, for an example state sequence of length 4

X = (x0,x1,x2,x3)

with corresponding observations

𝒪 = (𝒪0,𝒪1,𝒪2,𝒪3)

let π0 be the probability of starting in state x0, b0(𝒪0) is the probability of initially observing 𝒪0 and a01 is the probability of transitioning from state x0 to state x1. Continuing and multiplying the conditional probabilities, the probability of this state sequence with these observations is

X,𝒪 = π0bx0(𝒪0)ax0,x1bx1(𝒪1)ax1,x2bx2(𝒪2)ax2,x3bx3(𝒪3).

The three problems

  1. Given the model λ = (A,B,π) and a sequence of observations 𝒪, find 𝒪|λ. That is, we want to determine the likelihood of the observed sequence 𝒪, given the model.

    Problem 1 is the evaluation problem: given a model and a sequence of observations, how can we compute the probability that the observed the model produces the sequence. We can also view the problem as: how do we “score” or evaluate the model. If we think of the case in which we have several competing models, the solutions of problem 1 allows us to choose the model that best matches the observations.

  2. Given the model λ = (A,B,π) and a sequence of observations 𝒪, find an optimal state sequence after defining “optimal” in some way for the model. In other words, we want to uncover the hidden part of the Hidden Markov Model.

    Problem 2 is the estimation problem or discovery problem in which we attempt to uncover the hidden state sequence. We usually use an optimality criterion to discriminate which sequence best matches the observations. Two optimality criteria are common, and so the choice of criterion is a strong influence on the revealed state sequence. See the next section for specific examples of the two optimality criteria.

  3. Given an observation sequence 𝒪 and the dimensions N and M, find the model λ = (A,B,π) that maximizes the probability of 𝒪. This can interpreted as training a model to best fit the observed data. We can also view this as search in the parameter space represented by A, B and π.

    The solution of Problem 3 attempts to optimize the model parameters so as best to describe how the observed sequence comes about. The observed sequence used to solve Problem 3 is a training sequence since it “trains” the model. This training problem is the crucial one for most applications of hidden Markov models since it creates best models for real phenomena.

Rabiner [2] credits Jack Ferguson of the Institute for Defense Analysis with introducing the three fundamental problems for Hidden Markov Models.

Remark. The notation 𝒪|λ is standard and traditional in the theory and application of Hidden Markov Models. It is similar to notation occurring in statistics with parametric tests, indicating that that the model depends on the values of the parameters. The notation 𝒪|λ is not strictly related to conditional probabilities in the mathematical probability sense. A mathematician’s typical notation would indicate the parameter dependence with subscripts, so we would write λ 𝒪 instead. However, the notation 𝒪|λ is so strongly embedded in the literature of Hidden Markov Models that I will continue to use it, except occasionally in proofs where mathematical conditional probabilities do occur and the notation would mix meanings.

Example Computation

Consider the variable factory example from the previous section. For brevity, represent the factory good state with q0 = 0 and the bad state with q1 = 1. Suppose that for some particular three periods, the observation of factory outputs is a,u,a. We can exhaustively compute the probabilities of this observation sequence from various state sequences, see Table 1. We want to determine the most likely state sequence of the Markov process over these three periods. We could reasonably define “most likely” as the state sequence with the highest probability from among all possible state sequences of length three. Later we will use Dynamic Programming to efficiently find this particular sequence. On the other hand, we might reasonably define “most likely” as the state sequence that maximizes the expected number of correct states. In the next section, we will use the Viterbi Algorithm associated with Hidden Markov Models to find this sequence. The Dynamic Programming and Viterbi Algorithm solutions are not necessarily the same.

As a single example, compute the probability of starting in state 0, expressing observation a, transitioning from state 0 to state 0, expressing observation u, transitioning to state 1, and finally expressing observation a.

001 = (0.8)(0.99)(0.9)(0.01)(0.1)(0.96) = 0.000684288.

Similarly, we can directly compute the probability of each of the remaining 7 possible state sequences of length 3. The results are in Table 1 with the probabilities in the last column normalized by the total probability of observing (a,u,a) so they sum to 1. To find the optimal state sequence in the Dynamic Programming sense, choose the sequence with the highest probability, namely 111.

To find the optimal sequence in the Hidden Markov Model sense, we choose the most probable symbol at each position. To this end, we sum the probabilities in Table 1 that have a 0 in the first position. Doing so, we find the normalized probability of 0 in the first position is 0.5774 and hence the probability of a 1 in the first position is the complementary probability 0.4226. The Hidden Markov Model approach therefore chooses the first element of the sequence to be 0. Repeat this calculation for each time step of the sequence, obtaining the probabilities in Table 2. From Table 2, the optimal sequence in the Hidden Markov Model sense is 0, 1, 1. The optimal Dynamic Programming solution differs from the optimal Hidden Markov Model sequence.


000(0.8)(0.99)(0.9)(0.01)(0.9)(0.99)0.0063510.3640
001 (0.8)(0.99)(0.9)(0.01)(0.1)(0.96)0.00068430.03921
010 (0.8)(0.99)(0.1)(0.04)(0.0)(0.99)00
011 (0.8)(0.99)(0.1)(0.04)(1.0)(0.96)0.00304130.1743
100 (0.2)(0.96)(0.0)(0.01)(0.9)(0.99)00
101 (0.2)(0.96)(0.0)(0.01)(0.1)(0.96)00
110 (0.2)(0.96)(1.0)(0.04)(0.0)(0.99)00
111 (0.2)(0.96)(1.0)(0.04)(1.0)(0.96)0.0073730.4225
sum 0.017451

Table 1: State sequence probabilities for the Variable Factory with output (a,u,a).


Period/Obs 0 (a)1 (u)2 (a)
0 0.5774 0.4031 0.3640
1 0.4226 0.5968 0.6360

Table 2: Hidden Markov Model probabilities.

Sources

This section is adapted from: “A Revealing Introduction to Hidden Markov Models” by Mark Stamp, December 11, 2015; and “An Introduction to Hidden Markov Models”, by L. R. Rabiner and B. H. Juang. The calculations for the Variable Factory are extended from Sheldon M. Ross, Introduction to Probability Models, Section 4.11, pages 256-262, Academic Press, 2006, 9th Edition.

_______________________________________________________________________________________________

Algorithms, Scripts, Simulations

Algorithms, Scripts, Simulations

Algorithm

Scripts

__________________________________________________________________________

Problems to Work

Problems to Work for Understanding

  1. Verify all calculations in Table 1
  2. Verify all calculations in Table 2
  3. Create exhaustive state similar tables for the example of average annual temperatures with observations of tree-ring growth with observation sequence S,M,S,L.

__________________________________________________________________________

Books

Reading Suggestion:

References

[1]   L. R. Rabiner and B. H. Juang. An Introduction to Hidden Markov Models. IEEE ASSP Magazine, pages 4–16, January 1986. hidden markov models.

[2]   Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(3):257–286, February 1989. hidden Markov model, speech recognition.

[3]   Sheldon M. Ross. Introduction to Probability Models. Academic Press, 9th edition, 2006.

[4]   Mark Stamp. A revealing introduction to hidden markov models. https://www.cs.sjsu.edu/ stamp/RUA/HMM.pdf, December 2015.

__________________________________________________________________________

Links

Outside Readings and Links:

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1

Email to Steve Dunbar, sdunbar1 at unl dot edu

Last modified: Processed from LATEX source on April 14, 2017