Steven R. Dunbar
Department of Mathematics
203 Avery Hall
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Selected Topics in
Probability and Stochastic Processes
Steve Dunbar

__________________________________________________________________________

Fastest Mixing Markov Chain

_______________________________________________________________________ ### Rating

Mathematicians Only: prolonged scenes of intense rigor.

_______________________________________________________________________________________________ ### Question of the Day

What is the stationary distribution of a Markov chain? If a Markov chain is symmetric, that is, represented by a symmetric matrix, then what is the specific stationary distribution? What determines the rate at which the Markov chain approaches the stationary distribution?

_______________________________________________________________________________________________ ### Key Concepts

1. The asymptotic rate of convergence of the Markov chain to the stationary distribution depends on the second-largest eigenvalue modulus, called the mixing rate and denoted by $\mu$ or $\mu \left(P\right)$ where $P$ is the transition probability matrix.
2. If $P$ is an $n×n$ symmetric stochastic matrix, then
$\mu \left(P\right)=\parallel P-\left(1∕n\right)1{1}^{T}{\parallel }_{2}$

where $\parallel \cdot {\parallel }_{2}$ denotes the spectral norm.

3. The eigenvalues and eigenvectors of the tridiagonal matrix
${P}_{0}=\left(\begin{array}{ccccc}\hfill 1∕2\hfill & \hfill 1∕2\hfill & \hfill 0\hfill & \hfill \hfill & \hfill \hfill \\ \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 1∕2\hfill & \hfill \hfill & \hfill \hfill \\ \hfill \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \hfill \\ \hfill \hfill & \hfill \hfill & \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 1∕2\hfill \\ \hfill \hfill & \hfill \hfill & \hfill \hfill & \hfill 1∕2\hfill & \hfill 1∕2\hfill \end{array}\right)$

can be determined by solving a recursive system of equations and are ${\lambda }_{j}=cos\left(\frac{\left(j-1\right)\pi }{n}\right)$ for $j=1,\dots ,n$. In particular, the largest eigenvalue is $1$.

4. The mixing rate $\mu \left({P}_{0}\right)=cos\left(\pi ∕n\right)$ for ${P}_{0}$ is the smallest among all symmetric stochastic tridiagonal matrices.

__________________________________________________________________________ ### Vocabulary

1. A matrix for which the row sums ${\sum }_{j}{P}_{ij}=1$ is a stochastic matrix.
2. The asymptotic rate of convergence of the Markov chain to the stationary distribution is the mixing rate.
3. The asymptotic rate of convergence of the Markov chain to the stationary distribution depends on the second-largest eigenvalue modulus of $P$, called the mixing rate.

__________________________________________________________________________ ### Mathematical Ideas

This section is a survey, review, analysis and in-depth investigation of the article: “Fastest Mixing Markov Chain on a Path”, by Stephen Boyd, Persi Diaconis, Jun Sun and Lin Xiao, The American Mathematical Monthly, Volume 113, Number 1, January 2006, pages 70-74, .

#### Introduction to Fastest Mixing Markov Chain

This article considers the problem of assigning transition probabilities to the edges of a graph in such a way that the resulting Markov chain mixes as rapidly as possible. The problem is specialized because the graph corresponds to a random walk in that each transition is either to the vertex itself or its nearest neighbor. The article proves that the fastest mixing is obtained when each edge has a transition probability of $1∕2$. This result is intuitive.

Consider a graph with $n\ge 2$ vertices, labeled $1,2,\dots n$ with $n-1$ edges connecting adjacent vertices and with a loop at each vertex, as shown in Figure 1. Consider the Markov chain, that is to say random walk, on this graph, with transition probability from vertex $i$ to vertex $j$ denoted ${P}_{ij}$. The requirement that transitions can occur only on an edge or loop of the graph is equivalent to ${P}_{ij}=0$ when $|i-j|>1$. Thus $P$ is a tridiagonal matrix. Since ${P}_{ij}$ are transition probabilities, we have ${P}_{ij}\ge 0$ and ${\sum }_{j}{P}_{ij}=1$. Since the row sums are $1$, then we say that $P$ is a stochastic matrix. In matrix-vector terms, we can write

 $P1=1$ (1)

where $1$ is the $n×1$ vector whose entries are all $1$. Therefore, 1 is an eigenvalue of the matrix $P$. Figure 1: A graph with transition probabilities

Furthermore, the article requires that the transition probabilities are symmetric, so that ${P}_{ij}={P}_{ji}$. This then means that $P$ is a symmetric, doubly-stochastic, tridiagonal matrix. Since $P1=1$, then $\left(1∕n\right){1}^{T}P=\left(1∕n\right){1}^{T}$, and the uniform distribution is a stationary distribution for the probability transition matrix, or the Markov chain.

The asymptotic rate of convergence of the Markov chain to the stationary distribution depends on the second-largest eigenvalue modulus of $P$, called the mixing rate The mixing rate is denoted by $\mu \left(P\right)$:

$\mu \left(P\right)=\underset{i=2,\dots ,n}{max}|{\lambda }_{i}\left(P\right)|.$

See below for more information and proofs. The smaller $\mu \left(P\right)$ is, the faster the Markov chain converges to its stationary distribution.

#### Motivation and Application

The following is an application or a motivation for this Markov chain. A processor is located at each vertex of the graph. Each edge represents a direct network connection between the adjacent processors. The processor could be a computer in a network or it could be human worker, such as a line of barbers or assemblers in a workshop. Each processor has a job load or queue to finish, say processor $i$ has load ${q}_{i}\left(t\right)$ at time $t$ where ${q}_{i}\left(t\right)$ is a positive real number. At each step, the goal is shift loads across the edges in such a way as to balance the load. The shifting is done before any processing or work begins, so the total amount of work to be done is constant. More precisely, we would like ${q}_{i}\left(t\right)\to \stackrel{̄}{{q}_{i}}=\left(1∕n\right)\sum {q}_{i}\left(0\right)$ as $t\to \infty$. Moreover, we would like this balancing to take place as fast as possible. We intend to show that the balance can be accomplished fastest by shifting one-half of the load imbalance on each vertex from the more loaded to the less loaded processor.

Lemma 1. If $P$ is an $n×n$ symmetric stochastic matrix, then

$\mu \left(P\right)=\parallel P-\left(1∕n\right)1{1}^{T}{\parallel }_{2}$

where $\parallel \cdot {\parallel }_{2}$ denotes the spectral norm.

Remark 1. Recall that the spectral norm, also called the operator norm, is the natural norm of a matrix induced by the ${L}^{2}$ or Euclidean vector norm. The spectral norm is also the maximum singular value of the matrix, that is the square root of the maximum eigenvalue of ${A}^{H}A$ where ${A}^{H}$ is the conjugate transpose.

Proof. Note that $1$ is the eigenvector of $P$ associated with the eigenvalue $\lambda =1$ by equation (1). Also

$\left(1∕n\right)1{1}^{T}1=1.$

Let ${u}^{\left(2\right)},\dots {u}^{\left(n\right)}$ be the other $n-1$ eigenvectors of $P$ corresponding to the eigenvalues ${\lambda }_{2},\dots {\lambda }_{n}$. Because the eigenvectors are orthogonal, by taking the inner product of ${u}^{\left(j\right)}$ with eigenvector $1$, we see that ${\sum }_{i=1}^{n}{u}_{i}^{\left(j\right)}=0$ for $j=2,\dots ,n$. Therefore

$\left(1∕n\right)1{1}^{T}{u}^{\left(j\right)}=0.$

Then the eigenvalues of $P-\left(1∕n\right)1{1}^{T}$ are ${\lambda }_{1}=0,{\lambda }_{2},\dots {\lambda }_{n}$ with eigenvectors $1,{u}^{\left(2\right)},\dots ,{u}^{\left(n\right)}$ respectively. Since $P-\left(1∕n\right)1{1}^{T}$ is symmetric, it is equivalent to a diagonal matrix, and its spectral norm is equal to the maximum magnitude of its eigenvalues, i.e. $max\left\{|{\lambda }_{2}|,\dots ,|{\lambda }_{n}|\right\}$, which is $\mu \left(P\right)$. □

Remark 2. Note that the article has a slight mistake here, since the article asserts that $\mu \left(P\right)=max\left[{\lambda }_{2},\dots ,-{\lambda }_{n}\right]$. This implies that ${\lambda }_{2}>0>{\lambda }_{n}$. This is not always true, although it is true for the matrices considered here.

Lemma 2. If $P$ is an $n×n$ symmetric stochastic matrix and if $y$ and $z$ in ${ℝ}^{n}$ satisfy

then $\mu \left(P\right)\ge {1}^{T}z$.

Proof. Let the eigenvectors of $P$ be $\left\{{u}^{\left(1\right)}=1,{u}^{\left(2\right)},\dots {u}^{\left(n\right)}\right\}$, and by the Principal Axes Theorem, we may take these to be an orthonormal basis of ${ℝ}^{n}$. Let $y$ be as specified in the hypotheses (2) and (3) and let

$y=\sum _{i=1}^{n}{\alpha }_{i}{u}^{\left(i\right)}.$

By hypothesis (3), $\parallel y{\parallel }_{2}=\sqrt{{\sum }_{i=1}^{n}{\alpha }_{i}^{2}}=1$. By using the orthogonality of $1$ and ${u}^{\left(i\right)}$ for $i=2,\dots ,n$,

$\left(P-\left(1∕n\right)1{1}^{T}\right)\left(\sum _{i=1}^{n}{\alpha }_{i}{u}^{\left(i\right)}\right)=\sum _{i=1}^{n}{\alpha }_{i}{\lambda }_{i}{u}^{\left(i\right)}=\sum _{i=2}^{n}{\alpha }_{i}{\lambda }_{i}{u}^{\left(i\right)}.$

The last sum on the right side intentionally starts at $2$ because the first eigenvalue is $0$.

By Lemma 1

$\mu \left(P\right)=\parallel P-\left(1∕n\right)1{1}^{T}{\parallel }_{2}.$

By definition of the spectral norm

$\parallel P-\left(1∕n\right)1{1}^{T}{\parallel }_{2}=\underset{\parallel w\parallel =1}{max}\parallel \left(P-\left(1∕n\right)1{1}^{T}\right)w{\parallel }_{2}.$

Specializing to the vector $y$ with $\parallel y{\parallel }_{2}=1$ and using the definition of the 2-norm

$\begin{array}{lllllll}\hfill & \underset{\parallel w\parallel =1}{max}\parallel \left(P-\left(1∕n\right)1{1}^{T}\right)w{\parallel }_{2}\phantom{\rule{2em}{0ex}}& \hfill \phantom{\rule{2em}{0ex}}\parallel \left(P-\left(1∕n\right)1{1}^{T}\right)y{\parallel }_{2}& \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}& \hfill \\ \hfill & \phantom{\rule{2em}{0ex}}=\sqrt{{y}^{T}{\left(P-\left(1∕n\right)1{1}^{T}\right)}^{T}\left(P-\left(1∕n\right)1{1}^{T}\right)y}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}=\sqrt{{\left(\sum _{i=1}^{n}{\alpha }_{i}{u}^{\left(i\right)}\right)}^{T}{\left(P-\left(1∕n\right)1{1}^{T}\right)}^{T}\left(P-\left(1∕n\right)1{1}^{T}\right)\left(\sum _{i=1}^{n}{\alpha }_{i}{u}^{\left(i\right)}\right)}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}=\sqrt{{\left(\sum _{i=1}^{n}{\alpha }_{i}{\lambda }_{i}{u}^{\left(i\right)}\right)}^{T}\left(\sum _{i=1}^{n}{\alpha }_{i}{\lambda }_{i}{u}^{\left(i\right)}\right)}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}=\sqrt{\sum _{i=1}^{n}{\alpha }_{i}^{2}{\lambda }_{i}^{2}}.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Apply Jensen’s Inequality to the concave down function $\sqrt{\cdot }$ over the convex combination defined by ${\alpha }_{i}^{2}$ which has ${\sum }_{i=1}^{n}{\alpha }_{i}^{2}=1$

$\begin{array}{llll}\hfill \sqrt{\sum _{i=1}^{n}{\alpha }_{i}^{2}{\lambda }_{i}^{2}}& \ge \sum _{i=1}^{n}{\alpha }_{i}^{2}|{\lambda }_{i}|\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \ge \sum _{i=1}^{n}{\alpha }_{i}^{2}{\lambda }_{i}.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Now unwind the expression back into vector notation, again using the orthogonality of the eigenvectors

$\begin{array}{llll}\hfill \sum _{i=1}^{n}{\alpha }_{i}^{2}{\lambda }_{i}& ={\left(\sum _{i=1}^{n}{\alpha }_{i}{u}^{\left(i\right)}\right)}^{T}\left(P-\left(1∕n\right)1{1}^{T}\right)\left(\sum _{i=1}^{n}{\alpha }_{i}{u}^{\left(i\right)}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & ={y}^{T}\left(P-\left(1∕n\right)1{1}^{T}\right)y\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & ={y}^{T}Py\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\sum _{i,j}{P}_{ij}{y}_{i}{y}_{j}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Now use the hypothesis (4)

$\begin{array}{llll}\hfill \sum _{i,j}{P}_{ij}{y}_{i}{y}_{j}& \ge \sum _{i,j}\left(1∕2\right)\left({z}_{i}+{z}_{j}\right){P}_{ij}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\left(1∕2\right)\left({z}^{T}P1+{1}^{T}Pz\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & ={1}^{T}z.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

This establishes the lemma. □

Lemma 3. The eigenvalues and eigenvectors of the tridiagonal matrix

${P}_{0}=\left(\begin{array}{ccccc}\hfill 1∕2\hfill & \hfill 1∕2\hfill & \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill \\ \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 1∕2\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 1∕2\hfill \\ \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill & \hfill 1∕2\hfill & \hfill 1∕2\hfill \end{array}\right)$

are ${\lambda }_{1}=1$ and ${\lambda }_{j}=cos\left(\frac{\left(j-1\right)\pi }{n}\right)$ for $j=2,\dots ,n$.

Remark 3. The following proof is adapted from Feller, [2, Section XVI.2, pages 388-391]. In fact, Feller finds the eigenvalues and eigenvectors for the more general tridiagonal matrix

$P=\left(\begin{array}{ccccc}\hfill q\hfill & \hfill p\hfill & \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill \\ \hfill q\hfill & \hfill 0\hfill & \hfill p\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill q\hfill & \hfill 0\hfill & \hfill p\hfill \\ \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill & \hfill q\hfill & \hfill p\hfill \end{array}\right)$

Proof. The proof proceeds by directly finding the solution of the linear system ${P}_{0}u=\lambda u$. The problem is treated as a linear system in the $\left(n-2\right)×\left(n-2\right)$ system defined by equations $2,\dots n-1$ in the variables ${u}_{2},\dots ,{u}_{n-1}$ and then using the first and last equation as boundary conditions which will determine the values of $\lambda$ that permit a nontrivial solution.

The equations are

Equation (6) is satisfied by ${u}_{j}={s}^{j}$ provided

$\lambda s=\left(1∕2\right)+\left(1∕2\right){s}^{2}$

or ${s}_{+}=\lambda +\sqrt{{\lambda }^{2}-1}$ and ${s}_{-}=\lambda -\sqrt{{\lambda }^{2}-1}$. Then the general solution is of the form ${u}_{j}=A\left(\lambda \right){s}_{+}^{j}+B\left(\lambda \right){s}_{-}^{j}$. Applying the first equation (5) obtain

$\begin{array}{llll}\hfill \lambda \left(A{s}_{+}+B{s}_{-}\right)& =\left(1∕2\right)\left(A{s}_{+}+B{s}_{-}\right)+\left(1∕2\right)\left(A{s}_{+}^{2}+B{s}_{-}^{2}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill 0& =A\left[\left(1-2\lambda \right){s}_{+}+{s}_{+}^{2}\right]+B\left[\left(1-2\lambda \right){s}_{-}+{s}_{-}^{2}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill 0& =A\left[{s}_{+}-1\right]+B\left[{s}_{-}-1\right].\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Applying the last equation (7) obtain

$\begin{array}{llll}\hfill \lambda \left(A{s}_{+}^{n}+B{s}_{-}^{n}\right)& =\left(1∕2\right)\left(A{s}_{+}^{n-1}+B{s}_{-}^{n-1}\right)+\left(1∕2\right)\left(A{s}_{+}^{n}+B{s}_{-}^{n}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill 0& =A{s}_{+}^{n-1}\left[1-2\lambda {s}_{+}+{s}_{+}\right]+B{s}_{-}^{n-1}\left[1-2\lambda {s}_{-}+{s}_{-}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill 0& =A{s}_{+}^{n-1}\left[-{s}_{+}^{2}+{s}_{+}\right]+B{s}_{-}^{n-1}\left[-{s}_{-}^{2}-{s}_{-}\right].\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Combining these equations ${s}_{+}^{n}={s}_{-}^{n}$. Note that ${s}_{+}{s}_{-}=1$, so ${s}_{+}^{2n}=1$ and ${s}_{-}^{2n}=1$. That is, both ${s}_{+}$ and ${s}_{-}$ are $2n$th roots of unity. Therefore, ${s}_{+}$ and ${s}_{-}$ can be written in the form

${e}^{i\pi j∕n}=cos\left(\frac{i\pi j}{n}\right)+isin\left(\frac{i\pi j}{n}\right)$

for $j=0,1,2,\dots ,2n-1$.

Thus, the eigenvalues must be among the solutions of

${s}_{+}\left(\lambda \right)={e}^{i\pi j∕n}$

or

$\begin{array}{llll}\hfill \lambda +\sqrt{{\lambda }^{2}-1}& ={e}^{i\pi j∕n}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \sqrt{{\lambda }^{2}-1}& =-\lambda +{e}^{i\pi j∕n}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {\lambda }^{2}-1& ={\lambda }^{2}-2\lambda {e}^{i\pi j∕n}+{e}^{2i\pi j∕n}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill -1& =-2\lambda {e}^{i\pi j∕n}+{e}^{2i\pi j∕n}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \frac{1}{2{e}^{i\pi j∕n}}+\frac{{e}^{i\pi j∕n}}{2}& =\lambda \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \frac{{e}^{-i\pi j∕n}}{2}+\frac{{e}^{i\pi j∕n}}{2}& =\lambda \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill cos\pi j∕n& =\lambda \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}& \hfill \end{array}$

So to each $j$ we can find a root ${\lambda }_{j}$, namely

${\lambda }_{j}=cos\left(\pi j∕n\right)\phantom{\rule{2em}{0ex}}j=0,1,2,\dots ,2n-1.$

However, some of these roots are repeated, since

$cos\left(\pi \left(n-j\right)∕n\right)=cos\left(\pi \left(n+j\right)∕n\right)$

for $j=1,\dots ,n$. So there are $n$ eigenvalues ${\lambda }_{1}=1$ and ${\lambda }_{j}=cos\left(\frac{\left(j-1\right)\pi }{n}\right)$ for $j=2,\dots ,n$. □

Theorem 4. The value $\mu \left({P}_{0}\right)=cos\left(\pi ∕n\right)$ for

${P}_{0}=\left(\begin{array}{ccccc}\hfill 1∕2\hfill & \hfill 1∕2\hfill & \hfill \hfill & \hfill \hfill & \hfill \hfill \\ \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \hfill & \hfill \hfill \\ \hfill \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \hfill \\ \hfill \hfill & \hfill \hfill & \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 1∕2\hfill \\ \hfill \hfill & \hfill \hfill & \hfill \hfill & \hfill 1∕2\hfill & \hfill 1∕2\hfill \end{array}\right)$

is the smallest among all symmetric stochastic tridiagonal matrices.

Proof.

The proof proceeds by constructing a pair of vectors $y$ and $z$ that satisfy the assumptions of Lemma 2 for any symmetric tridiagonal stochastic matrix $P$. Furthermore, ${1}^{T}z=cos\left(\pi ∕n\right)$, so the mixing rate $\mu \left({P}_{0}\right)=cos\left(\pi ∕n\right)$ is the fastest possible.

By Lemma 3 and the definition of the mixing rate

$\mu \left({P}_{0}\right)={\lambda }_{2}={\lambda }_{n}=cos\left(\pi ∕n\right).$

Take $y={u}^{\left(2\right)}$, the second eigenvector of ${P}_{0}$, so the assumptions (2) and (3) in Lemma 2 are automatically satisfied. Take $z$ to be the vector with

${z}_{i}=\frac{1}{n}\left[cos\left(\frac{\pi }{n}\right)+cos\left(\frac{\left(2i-1\right)\pi }{n}\right)/cos\left(\pi ∕n\right)\right]$

for $i=1,\dots ,n$.

Note that

$\begin{array}{llll}\hfill \sum _{j=1}^{n}cos\left(\left(2j-1\right)\pi ∕n\right)& =\Re \left(\sum _{j=1}^{n}exp\left(i\left(2j-1\right)\pi ∕n\right)\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\Re \left(\sum _{j=1}^{n}exp\left(i2j\pi ∕n\right)exp\left(-i\pi ∕n\right)\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\Re \left(exp\left(-i\pi ∕n\right)\sum _{j=1}^{n}exp{\left(i2\pi ∕n\right)}^{j}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\Re \left(exp\left(-i\pi ∕n\right)exp\left(i2\pi ∕n\right)\sum _{j=1}^{n}exp{\left(i2\pi ∕n\right)}^{j-1}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\Re \left(exp\left(-i\pi ∕n\right)exp\left(2i\pi ∕n\right)\sum _{j=0}^{n-1}exp{\left(2\pi ∕n\right)}^{j}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\Re \left(exp\left(i\pi ∕n\right)\frac{1-exp{\left(i2\pi ∕n\right)}^{n}}{1-exp\left(i2\pi ∕n\right)}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\Re \left(exp\left(i\pi ∕n\right)\frac{1-exp\left(i2\pi \right)}{1-exp\left(i2\pi ∕n\right)}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =0.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Then it is easy to verify that ${1}^{T}z=cos\left(\pi ∕n\right)$.

Now check that $y$ and $z$ satisfy hypothesis  (4) of Lemma 2.

$\frac{{z}_{i}+{z}_{i+1}}{2}=\frac{1}{n}\left[cos\left(\frac{\pi }{n}\right)+\frac{1}{2}\left(cos\left(\frac{\left(2i-1\right)\pi }{n}\right)+cos\left(\frac{\left(2i+1\right)\pi }{n}\right)\right)/cos\left(\pi ∕n\right)\right]$

Using the cosine sum formula

$\frac{1}{2}\left(cos\left(\frac{\left(2i-1\right)\pi }{n}\right)+cos\left(\frac{\left(2i+1\right)\pi }{n}\right)\right)=cos\left(\frac{\pi }{n}\right)cos\left(\frac{2i\pi }{n}\right)$

this simplifies to

$\frac{{z}_{i}+{z}_{i+1}}{2}=\frac{1}{n}\left[cos\left(\frac{\pi }{n}\right)+cos\left(\frac{2i\pi }{n}\right)\right].$

Using the cosine sum formula again

$\frac{1}{n}\left[cos\left(\frac{\pi }{n}\right)+cos\left(\frac{2i\pi }{n}\right)\right]=\frac{2}{n}cos\left(\frac{\left(2i-1\right)\pi }{2n}\right)cos\left(\frac{\left(2i+1\right)\pi }{2n}\right)={y}_{i}{y}_{i+1}.$

Therefore equality in inequality (4) holds for the adjacent entries resulting from the nonzero subdiagonal and superdiagonal entries. For the diagonal entries, check that $\left({z}_{i}+{z}_{i}\right)∕2={z}_{i}\le {y}_{i}^{2}$. That is, check that

$cos\left(\frac{\pi }{n}\right)+cos\left(\frac{\left(2i-1\right)\pi }{2n}\right)/cos\left(\frac{\pi }{2n}\right)\le 2{cos}^{2}\left(\frac{\left(2i-1\right)\pi }{n}\right).$

Using the double-angle formula for the cosine

$2{cos}^{2}\left(\frac{\left(2i-1\right)\pi }{n}\right)=1+cos\left(\frac{\left(2i-1\right)\pi }{2n}\right).$

Therefore,

$cos\left(\frac{\pi }{n}\right)+cos\left(\frac{\left(2i-1\right)\pi }{n}\right)/cos\left(\frac{\pi }{n}\right)\le 1+cos\left(\frac{\left(2i-1\right)\pi }{n}\right)$

and moving all terms to the right obtain

$1-cos\left(\frac{\pi }{n}\right)-cos\left(\frac{\left(2i-1\right)\pi }{n}\right)/cos\left(\frac{\pi }{n}\right)+cos\left(\frac{\left(2i-1\right)\pi }{n}\right)\ge 0.$

This can be factored as

$\left[1-cos\left(\frac{\pi }{n}\right)\right]\left[1-cos\left(\frac{\left(2i-1\right)\pi }{n}\right)/cos\left(\frac{\pi }{n}\right)\right].$

This is true because

$cos\left(\frac{\left(2i-1\right)\pi }{n}\right)/cos\left(\frac{\pi }{n}\right)\le 1$

for $i=1,\dots ,n$. □

__________________________________________________________________________ ### Problems to Work for Understanding

1. Show that balancing the workload of a line of processors by shifting one-half of the load imbalance on each vertex from the more loaded to the less loaded processor can be represented by the tridiagonal matrix ${P}_{0}$.
2. Show that
$\frac{1}{2}\left(cos\left(\frac{\left(2i-1\right)\pi }{n}\right)+cos\left(\frac{\left(2i+1\right)\pi }{n}\right)\right)=cos\left(\frac{\pi }{n}\right)cos\left(\frac{2i\pi }{n}\right)$

and

$\frac{1}{n}\left[cos\left(\frac{\pi }{n}\right)+cos\left(\frac{2i\pi }{n}\right)\right]=\frac{2}{n}cos\left(\frac{\left(2i-1\right)\pi }{2n}\right)cos\left(\frac{\left(2i+1\right)\pi }{2n}\right).$

3. Show that the four eigenvalues of
${P}_{0}=\left(\begin{array}{cccc}\hfill 1∕2\hfill & \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 1∕2\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1∕2\hfill & \hfill 0\hfill & \hfill 1∕2\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1∕2\hfill & \hfill 1∕2\hfill \end{array}\right)$

are the fourth roots of unity.

4. Show that the eigenvalues and for the more general tridiagonal matrix
$P=\left(\begin{array}{ccccc}\hfill q\hfill & \hfill p\hfill & \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill \\ \hfill q\hfill & \hfill 0\hfill & \hfill p\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill q\hfill & \hfill 0\hfill & \hfill p\hfill \\ \hfill 0\hfill & \hfill \dots \phantom{\rule{0em}{0ex}}\hfill & \hfill 0\hfill & \hfill q\hfill & \hfill p\hfill \end{array}\right)$

‘are ${\lambda }_{1}=1$ and ${\lambda }_{j}=2\sqrt{pq}cos\left(\frac{\left(j-1\right)\pi }{n}\right)$ for $j=2,\dots ,n$. Then show directly that ${P}_{0}$ is the fastest mixing among all such tridiagonal matrices $P$.

5. Use mathematical software to numerically evaluate the eigenvalues of the matrices ${P}_{0}$ for sizes $n=2,\dots 8$ and show that the values agree with the exact eigenvalues in Lemma 3.

__________________________________________________________________________ ### References

   Stephen Boyd, Persi Diaconis, Jun Sun, and Lin Xiao. Fastest mixing Markov chain on a path. American Mathematical Monthly, 113(1):70–74, January 2006. Markov chains.

   William Feller. An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, volume I. John Wiley and Sons, third edition edition, 1973. QA 273 F3712.

__________________________________________________________________________ __________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.