Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar

__________________________________________________________________________

Algorithms for Hidden Markov Models

_______________________________________________________________________

Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See http://mathjax.org for details on supported browsers, accessibility, copy-and-paste, and other features.

_______________________________________________________________________________________________

Rating

Rating

Mathematically Mature: may contain mathematics beyond calculus with proofs.

_______________________________________________________________________________________________

Section Starter Question

Section Starter Question

__________________________________________________________________________

Key Concepts

Key Concepts

  1. The efficient algorithm called the forward algorithm finds 𝒪|λ. That is, the forward algorithm finds the likelihood of the observed sequence 𝒪, given the model.
  2. The backward algorithm is analogous to the forward algorithm in the solution to the first problem of Hidden Markov Models, except that it starts at the end and works back toward the beginning.
  3. The Viterbi algorithm gives the individually most likely state qi at time t.
  4. The training problem is crucial for most applications of hidden Markov models since it creates best models for real phenomena. The Baum-Welch algorithm solves the training problem.
  5. The Dynamic Programming algorithm solves the problem of finding the sequence of states whose conditional probability given the sequence of signals is maximal. The Dynamic Programming algorithm is the forward algorithm with “sum” replaced by “max”.

__________________________________________________________________________

Vocabulary

Vocabulary

  1. The forward algorithm is the efficient algorithm to find 𝒪|λ, that is, to find the likelihood of the observed sequence 𝒪, given the model.
  2. The backward algorithm is analogous to the forward algorithm in the solution to the first problem of Hidden Markov Models, except that it starts at the end and works back toward the beginning.
  3. The Viterbi algorithm gives the individually most likely state qi at time t.
  4. The observed sequence used to solve Problem 3 is a training sequence since it “trains” the model.
  5. The Dynamic Programming algorithm solves the problem of finding the sequence of states whose conditional probability given the sequence of signals is maximal.

__________________________________________________________________________

Mathematical Ideas

Mathematical Ideas

Solution to Problem 1

Recall that Problem 1 is the evaluation problem: given a model and a sequence of observations, how can we compute the probability that the model produces the sequence. We can also view the problem as: how do we “score” or evaluate the model? If we think of the case in which we have several competing models, the solution of Problem 1 allows us to choose the model that best matches the observations.

Let λ = (A,B,π) be a given model and let 𝒪 = (𝒪0,𝒪1,,𝒪T1) be a series of observations. We want to find 𝒪|λ, that is, we want to find the likelihood of the observed sequence 𝒪, given the model. Let X = (x0,x1,,xT1) be a state sequence. Then by the definition of B we have

𝒪|X,λ = bx0(𝒪0) bx1(𝒪1)bxT1(𝒪T1)

and by the definition of π and A it follows that

X|λ = πx0ax0,x1ax1,x2axT2,xT1.

Since

𝒪|X,λ = λ 𝒪|X = λ 𝒪 X λ X

we have

λ 𝒪,X = λ 𝒪|X λ X

or in the traditional Hidden Markov Model notation

𝒪,X|λ = 𝒪|X,λ X|λ.

By summing over all possible state sequences

𝒪|λ = X 𝒪,X|λ = X 𝒪|X,λ X|λ = Xπx0bx0(𝒪0)ax0,x1bx1(𝒪1)ax1,x2axT2,xT1bxT1(𝒪T1)

This direct computation requires about 2TNT multiplications, meaning that the number of multiplications becomes infeasible as either N or T increases.

However, an efficient algorithm can achieve the same result. The algorithm is the forward algorithm or alpha pass. For t = 0, 1, 2,T 1 and i = 0, 1,,N 1, define

αt(i) = 𝒪0,𝒪1,𝒪2,𝒪t,xt = qi|λ.

Then α(i) is the probability of the partial observation of the sequence up to time t, where the underlying Markov process is in state qi at time t. The crucial insight is that we can compute the α(i) recursively.


A schematic diagram of the event.

Figure 1: A schematic diagram of the event defined by α2(i)

Algorithm 1 (Forward Algorithm).

  1. For i = 0, 1,,N 1, let α0(i) = πibi(𝒪0).
  2. For t = 1, 2,,T 1 and i = 0, 1,,N 1, compute
    αt(i) = j=0N1α t1(j)aji bi(𝒪t).

  3. Then it follows that
    𝒪|λ = i=0N1α T1(i).


A schematic diagram of the computations in
			   the forward algorithm.

Figure 2: Schematic diagram of the computations in the forward algorithm as a lattice in time and states.

Proof. Step 2 of the Forward Algorithm.

Note that this proof uses the traditional mathematical notation for conditional probability, suppressing the model dependency. Let 𝒪 = (𝒪0,𝒪1,,𝒪T1) be a sequence of observations. Let (𝒪)k = (𝒪0,𝒪1,,𝒪k1)), for k n. Find the conditional probability of the Markov chain state at time n given 𝒪. Let

αt(j) = (𝒪)t,xt = j

and note that

xt = j|(𝒪)t = xt = j, (𝒪)t (𝒪)t = αt(j) iαt(i).

Now,

αt(j) = (𝒪0,,𝒪t1),𝒪t,xt = qj by summing over all the possible probabilities = i (𝒪0,,𝒪t1),xt1 = qi,𝒪t,xt = qj by the definition of conditional probability = iαt1(i) 𝒪t,xt = qj|(𝒪0,,𝒪t1),xt1 = qi by independence of past observations = iαt1(i) 𝒪t,xt = qj|xt1 = qi = iαt1(i)aijbj(st) see the explanation below justifying this next step = bj(st) iαt1(i)aij

where the preceding step used that

(𝒪)t = st,xt = qj|xt1 = qi = xt = j|xt1 = qi 𝒪t = st,|xt = qj,xt1 = qi by rules of conditional probability = Aij St = qj|xt = qi = Aijbj(st).

The forward algorithm requires about N2T multiplications, a large improvement over the 2TNT multiplications for the naive approach. More precisely, the forward pass algorithm requires N(N + 1)(T 1) + N multiplications and N(N 1)(T 1) additions.

Solution to Problem 2

Problem 2 is the estimation problem or discovery problem in which we try to uncover the hidden state sequence. We usually use an optimality criterion to discriminate which sequence best matches the observations. Two optimality criteria are common, and so the choice of criterion is a strong influence on chosen algorithm and the state sequence revealed. For Hidden Markov Models we often want to maximize the expected number of correct states. In contrast, the Dynamic Programming solution finds the highest scoring overall path. The two solutions are not necessarily the same.

The formal statement of Problem 2 is: Given the model λ = (A,B,π) and a sequence of observations 𝒪, find an optimal state sequence for some useful definition of “optimal”. In other words, we want to uncover the hidden states of the Hidden Markov Model.

The following defines the backward algorithm or beta-pass. This is analogous to the forward algorithm described in the solution to the first problem of Hidden Markov Models, except that it starts at the end and works back toward the beginning.

For t = 0, 1, 2,T 1 and i = 0, 1,,N 1, define

βt(i) = 𝒪t+1,𝒪t+2,𝒪T1|xt = qi,λ.

The crucial insight again is that we compute the β(i) recursively.


A schematic diagram of the event.

Figure 3: A schematic diagram of the event defined by β1(i)

Algorithm 2 (Backward Algorithm).

  1. For i = 0, 1,,N 1, let βT1(i) = 1, .
  2. For t = T 2,T 3,, 0 and i = 0, 1,,N 1, compute
    βt(i) = j=0N1a ijbj(𝒪t+1)βt+1(j).


A schematic diagram of the computations in
			   the backward algorithm.

Figure 4: Schematic diagram of the computations in the backward algorithm.

Proof. Step 2 of the Backward Algorithm. Note that this proof uses the traditional mathematical notation for conditional probability, suppressing the model dependency. By the conditional probability law of total probability:

βt(i) = j 𝒪t+1,𝒪t+2,𝒪T1|xt = qi,xt+1 = qj xt+1 = qj|Xt = qi = j 𝒪t+1,𝒪t+2,𝒪T1|xt = qi,xt+1 = qj aij by conditional probability = j 𝒪t+1|xt+1 = qj 𝒪t+2,𝒪T1|𝒪t+1,xt+1 = qj aij = jbj(𝒪t+1) 𝒪t+2,𝒪T1|𝒪t+1,xt+1 = qj aij = j=0N1a ijbj(𝒪t+1)βt+1(j)

This allows computation of 𝒪T1 through

𝒪T1 = i 𝒪0,𝒪1,,𝒪T1|x1 = qi πi = i 𝒪0|x1 = qi 𝒪1,,𝒪T1|x1 = qi,𝒪0 πi = ibi(𝒪0) 𝒪1,,𝒪T1|x1 = qi,𝒪πi = ibi(𝒪0)β1(i)πi.

For t = 0, 1, 2,T 1 and i = 0, 1,N 1 define

γt(i) = xt = qi|𝒪,λ.


A schematic diagram of the event.

Figure 5: A schematic diagram of the event defined by γ2(i)

Since αt(i) measures the relevant probability up to time t and βt(i) measures the relevant probability after time t

γt(i) = αt(i)βt(i) 𝒪|λ .

Recall that we obtain the denominator 𝒪|λ by summing αT1(i) over i or alternatively using the last step in the proof of the Backward Algorithm. From the definition of γt(i) it follows that the most likely state at time t is the state qi for which γi(t) is a maximum over the index i. This is the Viterbi algorithm.

Algorithm 3 (Viterbi Algorithm). Using γt(i), the individually most likely state it at time t is

it = arg max 0iN1[γt(i)]

for 0 t T 1.

Remark. Another approach to obtaining 𝒪0,𝒪1,,𝒪T1 is to combine both the forward and the backward approaches. Suppose that for some k we have computed both αk(j) and βk(j). Because

𝒪0,𝒪1,,𝒪T1,xk = j = 𝒪0,𝒪1.,𝒪k,xk = qj 𝒪k+1,,𝒪n,|𝒪0,𝒪1.,𝒪k,xk = qj by the independence of current observation from earlier observations = 𝒪0,𝒪1.,𝒪k,xk = qj 𝒪k+1,,𝒪n,|xk = qj = αk(j)βk(j).

The advantage of the previous identity for determining 𝒪0,𝒪1,,𝒪T1 is to simultaneously compute the sequence of forward functions, the alpha-passes starting at α1, and the sequence of backward functions, the beta-passes, starting at βT1. The parallel computations can stop when both αk and βk are known for some k.

Example of Viterbi Algorithm

As a toy example, consider the Variable Factory, considered in the section Examples of Hidden Markov Models..

The sequences of variable factory states is a classic Markov chain with transition probability matrix

A =      0    1
0   0.9   0.1

1    0    1

and initial distribution π = (0.8, 0.2).

The observations are the quality of the items produced in each state.

Then the conditional emission probabilities are:

a|0 = 0.99 u|0 = 0.01 a|1 = 0.96 u|1 = 0.04

so the emission matrix is

B =      0      1
0   0.99  0.01
1   0.96  0.04  .

The forward algorithm probabilities are:

αt(i)0 (a) 1 (u) 2 (a)




0(0.8)(0.99)[(0.792)(0.9)[(0.007128)(0.9)
= 0.792 + (0.192)(0)] + (0.010848)(0)]
× (0.01)× (0.99)
= 0.007128 = 0.006351




1(0.2)(0.96)[(0.792)(0.1)[(0.007128)(0.1)
= 0.192 + (0.192)(1)] + (0.010848)(1)]
× (0.04)× (0.96)
= 0.010848 = 0.011098




0.006351 + 0.011098
= 0.01745

The backward algorithm probabilities are:

βt(i)0 1 2




0(0.9)(0.01)(0.987)(0.9)(0.99)(1)1
+ (0.1)(0.04)(0.96) + (0.1)(0.96)(1)
= 0.012723 = 0.987




1(0)(0.01)(0.987)(0)(0.99)(1)1
+ (1)(0.04)(0.96) + (1)(0.96)(1)
= 0.0384 = 0.96

The joint probabilities are:

αβ0 1 2




(0.792)(0.012723)(0.0077128)(0.987)(0.006351)(1)
= 0.01007662 = 0.007035 = 0.006351




(0.192)(0.0384)(0.010848)(0.96)(0.011098)(1)
= 0.007373 = 0.010414 = 0.011098




0.017450.017450.01745

The gamma probabilities are:

γt(i)0 1 2




00.57750.40310.3640




10.42250.59680.6360

The Viterbi algorithm gives:

γt(i)0 1 2




00.57750.40310.3640




10.42250.59680.6360

At each time the state which is individually most likely is (0, 1, 1), i.e. factory good, not good, not good. Note that here all state transitions are valid, but valid transitions may not always result from application of this algorithm. This individually most likely state path is different from the most likely overall path (1, 1, 1) exhaustively computed in the examples of Hidden Markov Models.

Solution to Problem 3

The solution of Problem 3 attempts to optimize the model parameters to best represent how the observed sequence comes about. The observed sequence used to solve Problem 3 is a training sequence since it “trains” the model. This training problem is crucial for many applications of Hidden Markov Models since it creates best models for real phenomena.

The formal statement of Problem 3 is: Given an observation sequence 𝒪, determine by some means (physical knowledge, intuition, best guessing) the fixed dimensions N and M, then find the model λ = (A,B,π) that maximizes the probability of 𝒪. The interpretation is training a model by adjusting the model parameters to best fit the observations. We can also view this as discrete time search in the parameter space represented by A, B and π. The sizes of the matrices N and M are fixed but the algorithm will find the elements of A, B, and π, subject to the row stochastic condition. The fact that we can efficiently re-estimate the model itself at each step is one of the more amazing aspects of Hidden Markov Models.

For t = 0, 1,,T 2 and i,j 0, 1,,N 1, define the xi-passes as

ξt(i,j) = xt = qi,xt+1 = qj|𝒪,λ

so ξt(i,j) is the probability of being in state qi at time t and transitioning to state qj at time t + 1, given the observations. The xi’s in terms of α, β, A and B are

ξt(i,j) = αt(i)aijbj(𝒪t+1)βt+1(j) 𝒪|λ .

For t = 0, 1,,T 2, the relation between γt(i) and ξt(i,j) is:

γt(i) = j=0N1ξ t(i,j).


A schematic diagram of the computations in
			   the joint event algorithm.

Figure 6: Schematic diagram of the computations in the joint event algorithm as a lattice.

Given the gammas and the xis, we can re-estimate the model λ = (A,B,π) as follows:

  1. For i = 0, 1,,N 1, let πi = γ0(i).
  2. For i = 0, 1,,N 1 and for j = 0, 1,,N 1 compute
    āij = t=0T2ξ t(i,j) t=0T2γt(i) . (1)
  3. For j = 0, 1,,N 1 and for k = 0, 1,,M 1 compute
    b̄j(k) = t0,1,,T1,𝒪t=kξt(i,j) t=0T1γt(i) . (2)

The numerator of the re-estimated āij gives the expected number of transitions from state qi to state qj, while the denominator is the expected number of transitions from qi to any state. Then the ratio is the empirical probability of transitioning from state qi to state qj, the desired value of āij.

The numerator of the re-estimated b̄j(k) gives the expected number of times the model is in state qj with observation k, while the denominator is the expected number of transitions from qi. Then the ratio is the empirical probability of observing symbol k given that the model is in state qj, the desired value of bj(k).

Re-estimation is an iterative process. First we initialize λ = (A,B,π) with a best guess, or if no reasonable guess is available, we choose random values such that πi 1N and aij 1N and bj(k) 1M. It is critical that A, B and π are not uniform since exactly uniform values result in a local maximum from which the model cannot climb. As always, A, B, π must be row stochastic. One way to accomplish this is to start with each row uniform, say π = (1N,, 1N), select N random values in the range ui = (1 2N, 1 2N), i = 0,,N 1, find the mean ū = 1 N i=0N1u i, and take ui ū, so 1 N ui ū 1 N and j=0N1(u i ū) = 0. Then π = ( 1 N (u0 ū), 1 N (uN1 ū)) is positive and row-stochastic. Another way to accomplish this (which is employed by Dr. Lin Himmelmann, in the initHMM function in the HMM R library package) is to take

A = 1 2IN + 1 2N1N

where IN is an N × N identity matrix and 1N is an N × N matrix with 1 in every entry. The initial distribution and emission matrices are taken by Himmelmann to be uniform. Yet another way to accomplish this (which is employed by Mo Chen, in the hmmEm function in the Hidden Markov Model Toolbox Matlab library package) is to fill appropriately sized matrices and vectors with uniform random values and then normalize so the rows sum to 1.

Summarize the iterative process as:

Algorithm 4 (Baum-Welch Algorithm).

  1. Set a maximum number of iterations nmax, and initialize λ = (A,B,π) with a best guess or a randomized row-stochastic initialization as above.
  2. Using λ = (A,B,π) compute αt(i), βt(i), γt(i,j), γt(i) and 𝒪|λ.
  3. Re-estimate a new model λ̄ = (Ā,B̄,π̄) with empirical probabilities (1) and (2)
  4. If 𝒪|λ̄ increases over 𝒪|λ by at least some predetermined threshold or the predetermined maximum number of iterations nmax has not been exceeded, reset the model (A,B,π) = (Ā,B̄,π̄) and go to step 2.
  5. Else stop and output λ = (A,B,π).

In a mathematically dense pair of papers, Baum et al. show the Baum-Welch algorithm is basically a hill-climbing technique in the parameter space of the A, B matrices, so it converges at least to a local maximizer of the likelihood function. However, as with any hill-climbing algorithm, it is also prone to get stuck in local maxima, especially if the data are from a Hidden Markov Model that is far from the initial guess (A0,B0). On the other hand, if the starting point of the iterations is sufficiently close to the true but unknown Hidden Markov Model, then the algorithm may to converge to the true Hidden Markov Model. Note that the Baum-Welch algorithm is also referred to as an expectation maximization algorithm.

Scaling and Practical Considerations

Even the toy variable factory example above shows that all computations involve multiple products of probabilities, so the products tend to 0 quickly as T increases. Therefore, floating-point underflow is a serious problem for computation. The basic scaling procedure multiplies αt(i) by a scaling coefficient independent of i but depending on t with the goal of keeping the αt(i) within the dynamic range of the computer for 0 t T 1. The solution is to scale all computations by the sum over i of all αt(i), while ensuring that all formulas remain valid. The scaling ensures that the set of rescaled αt(i) sum over i to 1, so heuristically are on the order of 1N.

Algorithm 5 (Scaling Procedure).

    1. For i = 0, 1,,N 1, let α̃0(i) = α0(i).
    2. Let c0 = 1 j=0N1α̃ 0(j)
    3. For i = 0, 1,,N 1, let α̂(i) = c0α̃0(i).
  1. For i = 0, 1,,N 1, compute
    α̃t(i) = j=0N1α̂ t1(j)aji bi(𝒪t)

  2. Let ct = 1 j=0N1α̃ t(j)
  3. By induction compute α̂t(i) = c0c1ctαt(i). Then α̂t(i) = αt(i) j=0N1αt(j)
  4. Let β̂t(i) = ctβt(i) γ̂t(i) = α̂t(i)β̂t(i) 𝒪|λ ξ̂t(i,j) = α̂t(i)aijbj(𝒪t+1)β̂t+1(j)

Proof. The proof is by induction. To start, α̂t(i) = c0α̃t(i) = c0αt(i) = αt(i) j=0N1α̃ 0(j) = αt(i) j=0N1α 0(j).

The induction assumption is

α̂t(i) = c0c1ctαt(i).

Then

α̂t+1(i) = ct+1α̃t+1(i) = ct+1 j=0N1α̂ t(j)ajibi(𝒪t+1) = c0c1ct j=0N1α t(j)ajibi(𝒪t+1) = c0c1ctαt+1(i)

Hence

α̂t(i) = c0c1ctαt(i).

holds for all t by induction.

Then by definition

α̂t(i) = αt(i) j=0N1αt(j)

hence the α̂t(i) are the properly scaled values of αt(i) for all t. As a consequence

j=0N1α̂ T1(j) = 1.

Also,

j=0N1α̂ T1(j) = c0c1cT1 j=0N1α t(j) = c0c1cT1 𝒪|λ.

Combining the last two results gives

𝒪|λ = 1 c0c1cT1.

Remark. The only real change to the algorithm because of scaling is the procedure for computing 𝒪|λ. We cannot merely sum up the scaled values of α̂t(i) already. Additionally, to avoid underflow, instead compute

log 𝒪|λ = j=0T1 log c j.

Thus, we can compute the log of 𝒪|λ, but not 𝒪|λ itself, since it would be out of the dynamic range of the computer anyway.

Remark. For convenience let Ct = c0c1ct and Dt = ct+1cT1 so α̂t(j) = Ctαt(i) and β̂t+1(j) = Dt+1βt+1(i). Then the estimation formula for the Markov state transition probabilities becomes

āij = t=0T2γ̂ t(i,j) t=0T2γ̂t(i) = t=0T2α̂ t(i)aijbj(𝒪t+1)β̂t+1(j) t=0T2 j=0Nα̂t(i)aijbj(𝒪t+1)β̂t+1(j) = t=0T2C tαt(i)aijbj(𝒪t+1)Dt+1βt+1(j) t=0T2 j=0NCtαt(i)aijbj(𝒪t+1)Dt+1βt+1(j)

But

CtDt+1 = c0c1ct ct+1cT1 = CT1

independent of t. Hence the terms CtDt+1 cancel out of both numerator and denominator, and so the scaled re-estimation formulas give the same re-estimations as equations (1) and (2).

Dynamic Programming Algorithm for the Overall Path Problem

The last problem is to find the most likely overall path given the observations (not the states that are individually most likely at each time). That is, choose the sequence of states whose conditional probability, given the sequence of signals, is maximal. The Dynamic Programming algorithm solves this problem. The Dynamic Programming algorithm is the forward algorithm with “sum” replaced by “max”.

Algorithm 6 (Dynamic Programming).

  1. For i = 0, 1,,N 1 let δ0(i) = πibi(𝒪0).
  2. For t = 1, 2,,T 1 and i = 0, 1,,N 1, compute
    δt(i) = max j{0,,N1}δt1(j)ajibi(𝒪t)

  3. The best overall path is
    max j{0,,N1}δT1(j).

The Dynamic Programming Algorithm only gives the optimal probability, not the path. To find the most likely state path, one must also keep track of preceding states at each stage, tracing back from the highest-scoring final state.

Proof. Let Xk be a vector of the first k states. The goal is to find the specific sequence of states Xk = x1,,xk that maximizes Xk = x1,,xk |𝒪. Because

Xk = x1,,xk |𝒪 = Xk = x1,,xk ,𝒪 𝒪

this is equivalent to finding the sequence of states that maximizes (make sure you have a right and consistent notation for the sequence of observations, watch subscripts k and n!) Xk = x1,,xk ,𝒪.

To solve this maximization problem, let for k n,

V k(j) = max x1,,xk1 Xk = x1,,xk1 ,Xk = xk, (𝒪)k .

Recursively solve for V k(j) with

V k(j) = max i max x1,,xk2 Xk2 = x1,,xk2 ,Xk1 = qi,Xk = qj, (𝒪)k = max i max x1,,xk2 Xk2 = x1,,xk2 ,Xk1 = qi,Xk = qj, (𝒪)k1,𝒪k = max i max x1,,xk2 Xk2 = x1,,xk2 ,Xk1 = qi, (𝒪)k1 × Xk = qj,𝒪k|Xk1 = qi = max i Xk = qj,𝒪k|Xk1 = qi × max x1,,xk2 Xk2 = x1,,xk2 ,Xk1 = qi, (𝒪)k1 = max iaij p(𝒪k|j)V k1(i) = p(𝒪k|j) max iai,jV k1(i)

Numerical underflow is again a concern with the Dynamic Programming algorithm, again since we have multiple products of probabilities. For the Dynamic Programming algorithm, avoid underflow by simply taking logarithms. Then the underflow-resistant version of Dynamic Programming is

Algorithm 7.

  1. Let δ̂0(i) = log πibi(𝒪0), for i = 0, 1,,N 1.
  2. For t = 1, 2,,T 1 and i = 0, 1,,N 1, compute
    δ̂t(i) = max j{0,,N1}δ̂t1(j) + log(aji) + log[bi(𝒪t)]

  3. The probability of the best overall path is
    max j{0,,N1δ̂T1(j).

This just finds the probability, to find the optimal overall path requires recording the states.

Example of Dynamic Programming

Once again, consider the case of the Variable Factory:

δ0(0) = π0b0(𝒪0) = (0.8)(0.99) = 0.792 δ0(1) = π1b1(𝒪0) = (0.2)(0.96) = 0.192

δ1(0) = max δ0(0)a00b0(𝒪1)= (0.792)(0.9)(0.01) = 0.007128 δ0(1)a10b0(𝒪1) = (0.192)(0)(0.01) = 0.0 δ1(1) = max δ0(0)a01b1(𝒪1)= (0.792)(0.1)(0.04) = 0.003168 δ0(1)a11b1(𝒪1) = (0.192)(1)(0.04) = 0.00768

δ2(0) = max δ1(0)a00b0(𝒪2)= (0.007128)(0.9)(0.99) = 0.006351 δ1(1)a10b0(𝒪2) = (0.00768)(0)(0.96) = 0.0 δ2(1) = max δ1(0)a01b1(𝒪2)= (0.007128)(0.1)(0.99) = 0.0007057 δ1(1)a11b1(𝒪2) = (0.00768)(1)(0.96) = 0.007373

δt(i) 0 (a) 1 (u)2 (a)




00.7920.0071280.006351




10.192 0.007680.007373

Tracing back the source of the red maximum, the maximum overall state sequence was 111, same as exhaustive search. This simple example required only 18 multiplications compared with 40 for the exhaustive listing of all possibilities. This is not surprising, since Dynamic Programming throws out one path of two at every stage. Dynamic Programming achieves a similar efficiency for larger problems.

Sources

This algorithms in this section are adapted from Stamp, [3] and Rabiner [2]. The proofs of the algorithms are adapted from Ross. Additional ideas are adapted from Rabiner, [12]. The schematic diagrams are adapted from Rabiner, [2]. The problems are adapted from Stamp, [3].

_______________________________________________________________________________________________

Algorithms, Scripts, Simulations

Algorithms, Scripts, Simulations

Algorithm

Scripts

__________________________________________________________________________

Problems to Work

Problems to Work for Understanding

  1. Compute the forward pass probabilities for the Paleontological Temperature Model (tree rings as signals from average annual temperatures) from the section on Examples of Hidden Markov Models.
  2. Compute the backward pass probabilities for the Paleontological Temperature Model (tree rings as signals from average annual temperatures) from the section on Examples of Hidden Markov Models.
  3. Compute the gamma probabilities for the Paleontological Temperature Model (tree rings as signals from average annual temperatures) from the section on Examples of Hidden Markov Models.
  4. Use the Viterbi algorithm to find the most likely states for the Paleontological Temperature Model (tree rings as signals from average annual temperatures) from the section on Examples of Hidden Markov Models.
  5. Show that the forward pass algorithm requires N(N + 1)(T 1) + N multiplications and N(N 1)(T 1) additions.

__________________________________________________________________________

Books

Reading Suggestion:

References

[1]   L. R. Rabiner and B. H. Juang. An Introduction to Hidden Markov Models. IEEE ASSP Magazine, pages 4–16, January 1986. hidden markov models.

[2]   Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(3):257–286, February 1989. hidden Markov model, speech recognition.

[3]   Mark Stamp. A revealing introduction to hidden markov models. https://www.cs.sjsu.edu/ stamp/RUA/HMM.pdf, December 2015.

__________________________________________________________________________

Links

Outside Readings and Links:

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1

Email to Steve Dunbar, sdunbar1 at unl dot edu

Last modified: Processed from LATEX source on May 9, 2017