Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Selected Topics in
Probability and Stochastic Processes
Steve Dunbar

__________________________________________________________________________

Fastest Mixing Markov Chain

_______________________________________________________________________

QuestionofDay

Rating

Mathematicians Only: prolonged scenes of intense rigor.

_______________________________________________________________________________________________

QuestionofDay

Question of the Day

What is the stationary distribution of a Markov chain? If a Markov chain is symmetric, that is, represented by a symmetric matrix, then what is the specific stationary distribution? What determines the rate at which the Markov chain approaches the stationary distribution?

_______________________________________________________________________________________________

Key Concepts

Key Concepts

  1. The asymptotic rate of convergence of the Markov chain to the stationary distribution depends on the second-largest eigenvalue modulus, called the mixing rate and denoted by μ or μ(P) where P is the transition probability matrix.
  2. If P is an n × n symmetric stochastic matrix, then
    μ(P) = P (1n)11T 2

    where 2 denotes the spectral norm.

  3. The eigenvalues and eigenvectors of the tridiagonal matrix
    P0 = 1212 0 12 0 12 12 0 12 1212

    can be determined by solving a recursive system of equations and are λj = cos (j1)π n for j = 1,,n. In particular, the largest eigenvalue is 1.

  4. The mixing rate μ(P0) = cos(πn) for P0 is the smallest among all symmetric stochastic tridiagonal matrices.

__________________________________________________________________________

Vocabulary

Vocabulary

  1. A matrix for which the row sums jPij = 1 is a stochastic matrix.
  2. The asymptotic rate of convergence of the Markov chain to the stationary distribution is the mixing rate.
  3. The asymptotic rate of convergence of the Markov chain to the stationary distribution depends on the second-largest eigenvalue modulus of P, called the mixing rate.

__________________________________________________________________________

Mathematical Ideas

Mathematical Ideas

This section is a survey, review, analysis and in-depth investigation of the article: “Fastest Mixing Markov Chain on a Path”, by Stephen Boyd, Persi Diaconis, Jun Sun and Lin Xiao, The American Mathematical Monthly, Volume 113, Number 1, January 2006, pages 70-74, [1].

Introduction to Fastest Mixing Markov Chain

This article considers the problem of assigning transition probabilities to the edges of a graph in such a way that the resulting Markov chain mixes as rapidly as possible. The problem is specialized because the graph corresponds to a random walk in that each transition is either to the vertex itself or its nearest neighbor. The article proves that the fastest mixing is obtained when each edge has a transition probability of 12. This result is intuitive.

Consider a graph with n 2 vertices, labeled 1, 2,n with n 1 edges connecting adjacent vertices and with a loop at each vertex, as shown in Figure 1. Consider the Markov chain, that is to say random walk, on this graph, with transition probability from vertex i to vertex j denoted Pij. The requirement that transitions can occur only on an edge or loop of the graph is equivalent to Pij = 0 when |i j| > 1. Thus P is a tridiagonal matrix. Since Pij are transition probabilities, we have Pij 0 and jPij = 1. Since the row sums are 1, then we say that P is a stochastic matrix. In matrix-vector terms, we can write

P1 = 1 (1)

where 1 is the n × 1 vector whose entries are all 1. Therefore, 1 is an eigenvalue of the matrix P.


PIC

Figure 1: A graph with transition probabilities

Furthermore, the article requires that the transition probabilities are symmetric, so that Pij = Pji. This then means that P is a symmetric, doubly-stochastic, tridiagonal matrix. Since P1 = 1, then (1n)1T P = (1n)1T , and the uniform distribution is a stationary distribution for the probability transition matrix, or the Markov chain.

The asymptotic rate of convergence of the Markov chain to the stationary distribution depends on the second-largest eigenvalue modulus of P, called the mixing rate The mixing rate is denoted by μ(P):

μ(P) = max i=2,,n|λi(P)|.

See below for more information and proofs. The smaller μ(P) is, the faster the Markov chain converges to its stationary distribution.

Motivation and Application

The following is an application or a motivation for this Markov chain. A processor is located at each vertex of the graph. Each edge represents a direct network connection between the adjacent processors. The processor could be a computer in a network or it could be human worker, such as a line of barbers or assemblers in a workshop. Each processor has a job load or queue to finish, say processor i has load qi(t) at time t where qi(t) is a positive real number. At each step, the goal is shift loads across the edges in such a way as to balance the load. The shifting is done before any processing or work begins, so the total amount of work to be done is constant. More precisely, we would like qi(t) qi ̄ = (1n) qi(0) as t . Moreover, we would like this balancing to take place as fast as possible. We intend to show that the balance can be accomplished fastest by shifting one-half of the load imbalance on each vertex from the more loaded to the less loaded processor.

Proofs about Fastest Mixing

Lemma 1. If P is an n × n symmetric stochastic matrix, then

μ(P) = P (1n)11T 2

where 2 denotes the spectral norm.

Remark 1. Recall that the spectral norm, also called the operator norm, is the natural norm of a matrix induced by the L2 or Euclidean vector norm. The spectral norm is also the maximum singular value of the matrix, that is the square root of the maximum eigenvalue of AHA where AH is the conjugate transpose.

Proof. Note that 1 is the eigenvector of P associated with the eigenvalue λ = 1 by equation (1). Also

(1n)11T 1 = 1.

Let u(2),u(n) be the other n 1 eigenvectors of P corresponding to the eigenvalues λ2,λn. Because the eigenvectors are orthogonal, by taking the inner product of u(j) with eigenvector 1, we see that i=1nu i(j) = 0 for j = 2,,n. Therefore

(1n)11T u(j) = 0.

Then the eigenvalues of P (1n)11T are λ1 = 0,λ2,λn with eigenvectors 1,u(2),,u(n) respectively. Since P (1n)11T is symmetric, it is equivalent to a diagonal matrix, and its spectral norm is equal to the maximum magnitude of its eigenvalues, i.e. max{|λ2|,,|λn|}, which is μ(P). □

Remark 2. Note that the article has a slight mistake here, since the article asserts that μ(P) = max[λ2,,λn]. This implies that λ2 > 0 > λn. This is not always true, although it is true for the matrices considered here.

Lemma 2. If P is an n × n symmetric stochastic matrix and if y and z in n satisfy

1T y = 0 (2) y2 = 1 (3) (zi + zj)2 yiyj   for ij with Pij0  (4)

then μ(P) 1T z.

Proof. Let the eigenvectors of P be {u(1) = 1,u(2),u(n)}, and by the Principal Axes Theorem, we may take these to be an orthonormal basis of n. Let y be as specified in the hypotheses (2) and (3) and let

y = i=1nα iu(i).

By hypothesis (3), y2 = i=1nαi2 = 1. By using the orthogonality of 1 and u(i) for i = 2,,n,

(P (1n)11T ) i=1nα iu(i) = i=1nα iλiu(i) = i=2nα iλiu(i).

The last sum on the right side intentionally starts at 2 because the first eigenvalue is 0.

By Lemma 1

μ(P) = P (1n)11T 2.

By definition of the spectral norm

P (1n)11T 2 = max w=1(P (1n)11T )w 2.

Specializing to the vector y with y2 = 1 and using the definition of the 2-norm

max w=1(P (1n)11T )w 2 (P (1n)11T )y 2 = y T (P (1n)1 1 T )T (P (1n)1 1 T )y = i=1nαiu(i) T (P (1n)11T )T (P (1n)11T ) i=1nαiu(i) = i=1nαiλiu(i) T i=1nαiλiu(i) = i=1nαi2λi2.

Apply Jensen’s Inequality to the concave down function over the convex combination defined by αi2 which has i=1nα i2 = 1

i=1nαi2λi2 i=1nα i2|λ i| i=1nα i2λ i.

Now unwind the expression back into vector notation, again using the orthogonality of the eigenvectors

i=1nα i2λ i = i=1nα iu(i) T (P (1n)11T ) i=1nα iu(i) = yT (P (1n)11T )y = yT Py = i,jPijyiyj

Now use the hypothesis (4)

i,jPijyiyj i,j(12)(zi + zj)Pij = (12)(zT P1 + 1T Pz) = 1T z.

This establishes the lemma. □

Lemma 3. The eigenvalues and eigenvectors of the tridiagonal matrix

P0 = 1212 0 0 12 0 12 0 0 0 0 12 0 12 0 0 1212

are λ1 = 1 and λj = cos (j1)π n for j = 2,,n.

Remark 3. The following proof is adapted from Feller, [2, Section XVI.2, pages 388-391]. In fact, Feller finds the eigenvalues and eigenvectors for the more general tridiagonal matrix

P = qp00 q 0p0 0 0 0 q 0 p 0 0qp

Proof. The proof proceeds by directly finding the solution of the linear system P0u = λu. The problem is treated as a linear system in the (n 2) × (n 2) system defined by equations 2,n 1 in the variables u2,,un1 and then using the first and last equation as boundary conditions which will determine the values of λ that permit a nontrivial solution.

The equations are

λu1 = (12)u1 + (12)u2  (5) λuj = (12)uj1 + (12)uj+1,j = 2,.n 1  (6) λun = (12)un1 + (12)un  (7)

Equation (6) is satisfied by uj = sj provided

λs = (12) + (12)s2

or s+ = λ + λ2 1 and s = λ λ2 1. Then the general solution is of the form uj = A(λ)s+j + B(λ)s j. Applying the first equation (5) obtain

λ(As+ + Bs) = (12)(As+ + Bs) + (12)(As+2 + Bs 2) 0 = A[(1 2λ)s+ + s+2] + B[(1 2λ)s + s2] 0 = A[s+ 1] + B[s 1].

Applying the last equation (7) obtain

λ(As+n + Bs n) = (12)(As +n1 + Bs n1) + (12)(As +n + Bs n) 0 = As+n1[1 2λs + + s+] + Bsn1[1 2λs + s] 0 = As+n1[s +2 + s +] + Bsn1[s 2 s ].

Combining these equations s+n = s n. Note that s+s = 1, so s+2n = 1 and s2n = 1. That is, both s+ and s are 2nth roots of unity. Therefore, s+ and s can be written in the form

eiπjn = cos iπj n + i sin iπj n

for j = 0, 1, 2,, 2n 1.

Thus, the eigenvalues must be among the solutions of

s+(λ) = eiπjn

or

λ + λ2 1 = eiπjn λ2 1 = λ + eiπjn λ2 1 = λ2 2λeiπjn + e2iπjn 1 = 2λeiπjn + e2iπjn 1 2eiπjn + eiπjn 2 = λ eiπjn 2 + eiπjn 2 = λ cos πjn = λ

So to each j we can find a root λj, namely

λj = cos(πjn)j = 0, 1, 2,, 2n 1.

However, some of these roots are repeated, since

cos(π(n j)n) = cos(π(n + j)n)

for j = 1,,n. So there are n eigenvalues λ1 = 1 and λj = cos (j1)π n for j = 2,,n. □

Theorem 4. The value μ(P0) = cos(πn) for

P0 = 1212 12 0 0 12 0 12 1212

is the smallest among all symmetric stochastic tridiagonal matrices.

Proof.

The proof proceeds by constructing a pair of vectors y and z that satisfy the assumptions of Lemma 2 for any symmetric tridiagonal stochastic matrix P. Furthermore, 1T z = cos(πn), so the mixing rate μ(P0) = cos(πn) is the fastest possible.

By Lemma 3 and the definition of the mixing rate

μ(P0) = λ2 = λn = cos(πn).

Take y = u(2), the second eigenvector of P0, so the assumptions (2) and (3) in Lemma 2 are automatically satisfied. Take z to be the vector with

zi = 1 n cos π n + cos (2i 1)π n cos πn

for i = 1,,n.

Note that

j=1n cos((2j 1)πn) = j=1n exp(i(2j 1)πn) = j=1n exp(i2jπn) exp(iπn) = exp(iπn) j=1n exp(i2πn)j = exp(iπn) exp(i2πn) j=1n exp(i2πn)j1 = exp(iπn) exp(2iπn) j=0n1 exp(2πn)j = exp(iπn)1 exp(i2πn)n 1 exp(i2πn) = exp(iπn) 1 exp(i2π) 1 exp(i2πn) = 0.

Then it is easy to verify that 1T z = cos(πn).

Now check that y and z satisfy hypothesis  (4) of Lemma 2.

zi + zi+1 2 = 1 n cos π n + 1 2 cos (2i 1)π n + cos (2i + 1)π n cos πn

Using the cosine sum formula

1 2 cos (2i 1)π n + cos (2i + 1)π n = cos π n cos 2iπ n

this simplifies to

zi + zi+1 2 = 1 n cos π n + cos 2iπ n .

Using the cosine sum formula again

1 n cos π n + cos 2iπ n = 2 n cos (2i 1)π 2n cos (2i + 1)π 2n = yiyi+1.

Therefore equality in inequality (4) holds for the adjacent entries resulting from the nonzero subdiagonal and superdiagonal entries. For the diagonal entries, check that (zi + zi)2 = zi yi2. That is, check that

cos π n + cos (2i 1)π 2n cos π 2n 2 cos 2 (2i 1)π n .

Using the double-angle formula for the cosine

2 cos 2 (2i 1)π n = 1 + cos (2i 1)π 2n .

Therefore,

cos π n + cos (2i 1)π n cos π n 1 + cos (2i 1)π n

and moving all terms to the right obtain

1 cos π n cos (2i 1)π n cos π n + cos (2i 1)π n 0.

This can be factored as

1 cos π n 1 cos (2i 1)π n cos π n.

This is true because

cos (2i 1)π n cos π n 1

for i = 1,,n. □

__________________________________________________________________________

Problems to Work

Problems to Work for Understanding

  1. Show that balancing the workload of a line of processors by shifting one-half of the load imbalance on each vertex from the more loaded to the less loaded processor can be represented by the tridiagonal matrix P0.
  2. Show that
    1 2 cos (2i 1)π n + cos (2i + 1)π n = cos π n cos 2iπ n

    and

    1 n cos π n + cos 2iπ n = 2 n cos (2i 1)π 2n cos (2i + 1)π 2n .

  3. Show that the four eigenvalues of
    P0 = 1212 0 0 12 0 12 0 0 12 0 12 0 0 1212

    are the fourth roots of unity.

  4. Show that the eigenvalues and for the more general tridiagonal matrix
    P = qp00 q 0p0 00 0 q 0 p 00qp

    ‘are λ1 = 1 and λj = 2pq cos (j1)π n for j = 2,,n. Then show directly that P0 is the fastest mixing among all such tridiagonal matrices P.

  5. Use mathematical software to numerically evaluate the eigenvalues of the matrices P0 for sizes n = 2,8 and show that the values agree with the exact eigenvalues in Lemma 3.

__________________________________________________________________________

Books

Reading Suggestion:

References

[1]   Stephen Boyd, Persi Diaconis, Jun Sun, and Lin Xiao. Fastest mixing Markov chain on a path. American Mathematical Monthly, 113(1):70–74, January 2006. Markov chains.

[2]   William Feller. An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, volume I. John Wiley and Sons, third edition edition, 1973. QA 273 F3712.

__________________________________________________________________________

Links

Outside Readings and Links:

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1

Email to Steve Dunbar, sdunbar1 at unl dot edu

Last modified: Processed from LATEX source on April 16, 2010