Steven R. Dunbar
Department of Mathematics
203 Avery Hall
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar

__________________________________________________________________________

Large Deviations

_______________________________________________________________________

Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See http://mathjax.org for details on supported browsers, accessibility, copy-and-paste, and other features.

_______________________________________________________________________________________________

### Rating

Mathematicians Only: prolonged scenes of intense rigor.

_______________________________________________________________________________________________

### Section Starter Question

From the proof of the Weak Law of Large Numbers, we know

$ℙn\left[\left|\frac{{S}_{n}}{n}-p\right|>𝜖\right]<\frac{p\left(1-p\right)}{n{𝜖}^{2}}$

or more roughly, the deviation of the sample mean of Bernoulli trials from $p$ is $O\left(1∕n\right)$. Is it possible to do provide a more reﬁned estimate of the probability of such a deviation? Why do you think so? What would be the possible statement of a better result?

_______________________________________________________________________________________________

### Key Concepts

1. Large Deviations Theory is about the remote tail of a probability distribution. Roughly speaking, large deviation theory estimates the exponential decay of the probability of extreme events, as the number of observations grows arbitrarily large.
2. For $0<𝜖<1-p$ and $n\ge 1$,
$ℙn\left[\frac{{S}_{n}}{n}>p+𝜖\right]\le {e}^{-n{h}_{+}\left(𝜖\right)}$

where ${h}_{+}\left(𝜖\right)=\left(p+𝜖\right)ln\left(\frac{p+𝜖}{p}\right)+\left(1-p-𝜖\right)ln\left(\frac{1-p-𝜖}{1-p}\right)$

__________________________________________________________________________

### Vocabulary

1. Large Deviations Theory is about the remote tail of a probability distribution. Roughly speaking, large deviation theory estimates the exponential decay of the probability of extreme events, as the number of observations grows arbitrarily large.

__________________________________________________________________________

### Mathematical Ideas

#### The Large Deviations Estimate

Large Deviations Theory is about the remote tail of a probability distribution. Roughly speaking, large deviation theory estimates the exponential decay of the probability of extreme events, as the number of observations grows arbitrarily large. Large deviations theory was perhaps discovered by Swedish mathematician Harald Cramér who applied it to model the insurance business. More general results were later obtained by H. Chernoﬀ, among other people. An incomplete list of mathematicians who have made important advances would include S. S. R. Varadhan, D. Ruelle and O. E. Lanford.

Given parameter value $p$, deﬁne ${h}_{+}\left(𝜖\right)$ for $0<𝜖<1-p$ as a function of $𝜖$

${h}_{+}\left(𝜖\right)=\left(p+𝜖\right)ln\left(\frac{p+𝜖}{p}\right)+\left(1-p-𝜖\right)ln\left(\frac{1-p-𝜖}{1-p}\right).$

See Figure 1.

Note that $pln\left(p\right)+\left(1-p\right)ln\left(1-p\right)$ is the entropy of the Bernoulli distribution with probability $p$ and complementary probability $1-p$. Therefore ${h}_{+}\left(𝜖\right)$ is the entropy of the distribution with probability $p+𝜖$.

The ﬁrst two lemmas are Taylor expansions of ${h}_{+}\left(𝜖\right)$ and ${e}^{-n{h}_{+}\left(𝜖\right)}$ in $𝜖$ to give a sense of the asymptotics of these functions.

Lemma 1.

${h}_{+}\left(𝜖\right)=\frac{1}{2p\left(1-p\right)}{𝜖}^{2}+\frac{2p-1}{6{p}^{2}{\left(1-p\right)}^{2}}{𝜖}^{3}+\frac{1-3p+3{p}^{2}}{12{p}^{3}{\left(1-p\right)}^{3}}{𝜖}^{4}+O\left({𝜖}^{5}\right)$

Lemma 2.

${e}^{-n{h}_{+}\left(𝜖\right)}=1-\frac{n}{2p\left(1-p\right)}{𝜖}^{2}-\frac{n\left(2p-1\right)}{6{p}^{2}{\left(1-p\right)}^{2}}{𝜖}^{3}+\frac{2n-6np+6n{p}^{2}-3{n}^{2}p+3{n}^{2}{p}^{2}}{24{p}^{3}{\left(1-p\right)}^{3}}{𝜖}^{4}+O\left({𝜖}^{5}\right)$

See also Figure 2 which displays ${e}^{-n{h}_{+}\left(𝜖\right)}$ as a function of $𝜖$ for $p=0.6$ and $n=10,20,30$.

Theorem 3 (Large Deviations Estimate). For $0<𝜖<1-p$ and $n\ge 1$,

$ℙn\left[\frac{{S}_{n}}{n}\ge p+𝜖\right]\le {e}^{-n{h}_{+}\left(𝜖\right)}.$

Proof. Fix $t>0$ and then

$ℙn\left[\frac{{S}_{n}}{n}\ge p+𝜖\right]=ℙn\left[{e}^{t\left({S}_{n}-np-n𝜖\right)}\ge 1\right].$

Now apply Markov’s Inequality to this positive random variable:

$\begin{array}{llll}\hfill ℙn\left[{e}^{t\left({S}_{n}-np-n𝜖\right)}\ge 1\right]& \le 𝔼\left[{e}^{t\left({S}_{n}-np-n𝜖\right)}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & ={e}^{-nt\left(p+𝜖\right)}𝔼\left[{e}^{t{S}_{n}}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & ={e}^{-nt\left(p+𝜖\right)}\sum _{k=0}^{n}{e}^{tk}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p}^{k}{\left(1-p\right)}^{n-k}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & ={e}^{-nt\left(p+𝜖\right)}{\left(1-p+p{e}^{t}\right)}^{n}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & ={e}^{-n\left(t\left(p+𝜖\right)-ln\left(1-p+p{e}^{t}\right)\right)}.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Now to complete the estimation, we only need to ﬁnd

$\underset{t>0}{sup}\left(t\left(p+𝜖\right)-ln\left(1-p+p{e}^{t}\right)\right).$

For notation, let $g\left(t\right)=\left(t\left(p+𝜖\right)-ln\left(1-p+p{e}^{t}\right)\right)$ for parameters $0 and $0<𝜖<1-p$. Then $g\left(0\right)=0$ and ${g}^{\prime }\left(t\right)=p+𝜖-p{e}^{t}∕\left(1-p+p{e}^{t}\right)$, so ${g}^{\prime }\left(0\right)=𝜖$. Furthermore, $\underset{t\to \infty }{lim}{g}^{\prime }\left(t\right)=p+𝜖-1<0$. Thus the supremum is attained at some strictly positive value. The derivative is zero only at

$t=ln\left(\frac{-p+{p}^{2}-𝜖+𝜖\phantom{\rule{0.3em}{0ex}}p}{p\left(p+𝜖-1\right)}\right)=ln\left(\frac{\left(p+𝜖\right)\left(1-p\right)}{p\left(1-p-𝜖\right)}\right).$

Therefore, $g\left(t\right)$ has a maximum value of ${h}_{+}\left(𝜖\right)$ since by plugging in the critical point, we obtain:

$\begin{array}{llll}\hfill & \left(p+𝜖\right)ln\left(\frac{p+𝜖}{p}\right)+\left(p+𝜖\right)ln\left(\frac{1-p}{1-p-𝜖}\right)-ln\left(1-p+p\frac{\left(p+𝜖\right)\left(1-p\right)}{p\left(1-p-𝜖\right)}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}=\left(p+𝜖\right)ln\left(\frac{p+𝜖}{p}\right)+\left(p+𝜖\right)ln\left(\frac{1-p}{1-p-𝜖}\right)-ln\left(\left(1-p\right)\left(1+\frac{\left(p+𝜖\right)}{\left(1-p-𝜖\right)}\right)\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}=\left(p+𝜖\right)ln\left(\frac{p+𝜖}{p}\right)+\left(p+𝜖\right)ln\left(\frac{1-p}{1-p-𝜖}\right)-ln\left(\frac{1-p}{1-p-𝜖}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\phantom{\rule{2em}{0ex}}{h}_{+}\left(𝜖\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Deﬁne ${h}_{-}\left(𝜖\right)={h}_{+}\left(-𝜖\right)$ for $0<𝜖.

Corollary 1. For $0<𝜖 and $n\ge 1$,

$ℙn\left[\frac{{S}_{n}}{n}\le p-𝜖\right]\le {e}^{-n{h}_{-}\left(𝜖\right)}.$

Proof. Interchange the designation of success and failure, so now “success” occurs with probability $1-p$ and “failure” occurs with probability $p$. Consider the probability that the proportion of “successes” in this complementary process, ${S}_{n}^{c}∕n$ exceeds the mean by $𝜖$:

$ℙn\left[\frac{{S}_{n}^{c}}{n}>1-p+𝜖\right]\le {e}^{-n{h}_{+}^{c}\left(𝜖\right)}$

where ${h}_{+}^{c}\left(𝜖\right)=\left(1-p+𝜖\right)ln\left(\frac{1-p+𝜖}{1-p}\right)+\left(p-𝜖\right)ln\left(\frac{p-𝜖}{p}\right)={h}_{+}\left(-𝜖\right)={h}_{-}\left(𝜖\right)$. The inequality is then

$ℙn\left[1-\frac{{S}_{n}^{c}}{n}

Corollary 2. For $0<𝜖 and $n\ge 1$,

$ℙn\left[\left|\frac{{S}_{n}}{n}-p\right|\ge 𝜖\right]\le {e}^{-n{h}_{+}\left(𝜖\right)}+{e}^{-n{h}_{-}\left(𝜖\right)}.$

See Figure 3. Note that this estimate is correct but empty for $𝜖=0$ and a neighborhood of $𝜖=0$.

### The Large Deviations Estimate Cannot Be Improved

We will frequently use the following lemma to estimate the value of a binomial probability for large values of $n$ and proportionally large values of $k$.

Lemma 4 (Binomial Asymptotics). If ${k}_{n}\to \infty$ as $n\to \infty$ then

$ℙn\left[{S}_{n}={k}_{n}\right]\sim \frac{1}{\sqrt{2\pi }}\sqrt{\frac{n}{{k}_{n}\left(n-{k}_{n}\right)}}{\left(\frac{np}{{k}_{n}}\right)}^{{k}_{n}}{\left(\frac{n\left(1-p\right)}{n-{k}_{n}}\right)}^{n-{k}_{n}}.$

as $n\to \infty$.

Proof. We know that

$ℙn\left[{S}_{n}={k}_{n}\right]=\frac{n!}{{k}_{n}!\left(n-{k}_{n}\right)!}{p}^{{k}_{n}}{\left(1-p\right)}^{n-{k}_{n}}.$

so

 $1=\underset{n\to \infty }{lim}\frac{\frac{n!}{{k}_{n}!\left(n-{k}_{n}\right)!}{p}^{{k}_{n}}{\left(1-p\right)}^{n-{k}_{n}}}{ℙn\left[{S}_{n}={k}_{n}\right]}.$ (1)

Use Stirling’s approximation in the form $m!\sim \sqrt{2\pi }{m}^{m+1∕2}{e}^{-m}$. Find the asymptotics of the binomial probability by replacing each factorial with the Stirling approximation. That is

$\begin{array}{llll}\hfill 1& =\underset{n\to \infty }{lim}\frac{\sqrt{2\pi }{n}^{n+1∕2}{e}^{-n}}{n!}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\underset{n\to \infty }{lim}\frac{\sqrt{2\pi }\sqrt{n}{n}^{{k}_{n}}{n}^{n-{k}_{n}}{e}^{-{k}_{n}}{e}^{-n+{k}_{n}}}{n!}\phantom{\rule{2em}{0ex}}& \hfill \text{(2)}\end{array}$

and

$\begin{array}{llll}\hfill 1& =\underset{n\to \infty }{lim}\frac{\left(n-{k}_{n}\right)!}{\sqrt{2\pi }{\left(n-{k}_{n}\right)}^{\left(n-{k}_{n}\right)+1∕2}{e}^{-\left(n-{k}_{n}\right)}}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\underset{n\to \infty }{lim}\frac{\left(n-{k}_{n}\right)!}{\sqrt{2\pi }\sqrt{n-{k}_{n}}{\left(n-{k}_{n}\right)}^{\left(n-{k}_{n}\right)}{e}^{-n+{k}_{n}}}\phantom{\rule{2em}{0ex}}& \hfill \text{(3)}\end{array}$

and

$\begin{array}{llll}\hfill 1& =\underset{n\to \infty }{lim}\frac{\left({k}_{n}\right)!}{\sqrt{2\pi }{\left({k}_{n}\right)}^{{k}_{n}+1∕2}{e}^{{k}_{n}}}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\underset{n\to \infty }{lim}\frac{{k}_{n}!}{\sqrt{2\pi }\sqrt{{k}_{n}}{\left({k}_{n}\right)}^{{k}_{n}}{e}^{-{k}_{n}}}.\phantom{\rule{2em}{0ex}}& \hfill \text{(4)}\end{array}$

Then multiply the terms on the left and right sides of equations (1), (2), (3), and (4) and cancel the factorials, two of the $\sqrt{2\pi }$ terms, the exponentials and combine the powers to obtain:

$ℙn\left[{S}_{n}={k}_{n}\right]\sim \frac{1}{\sqrt{2\pi }}\sqrt{\frac{n}{{k}_{n}\left(n-{k}_{n}\right)}}{\left(\frac{np}{{k}_{n}}\right)}^{{k}_{n}}{\left(\frac{n\left(1-p\right)}{n-{k}_{n}}\right)}^{n-{k}_{n}}.$

The estimate given by the Large Deviations result is as good as it can be in the following sense:

Theorem 5. For all $0<𝜖<1-p$,

$\underset{n\to \infty }{lim}\frac{1}{n}ln\left(ℙn\left[\frac{{S}_{n}}{n}\ge p+𝜖\right]\right)=-{h}_{+}\left(𝜖\right).$

Proof.

First, the Large Deviations Estimate can be written as

$\frac{1}{n}ln\left(ℙn\left[\frac{{S}_{n}}{n}\ge p+𝜖\right]\right)\le -{h}_{+}\left(𝜖\right)$

so immediately

$\underset{n\to \infty }{limsup}\frac{1}{n}ln\left(ℙn\left[\frac{{S}_{n}}{n}\ge p+𝜖\right]\right)\le -{h}_{+}\left(𝜖\right).$

Let ${k}_{n}=⌈n\left(p+𝜖\right)⌉$ be the least integer greater than or equal to $n\left(p+𝜖\right)$. Then

${k}_{n}-1

Therefore, ${k}_{n}\to \infty$ as $n\to \infty$. Also ${k}_{n}\sim n\left(p+𝜖\right)$.

Note that

$ℙn\left[{S}_{n}\ge n\left(p+𝜖\right)\right]\ge ℙn\left[{S}_{n}={k}_{n}\right].$

We will show that

$\underset{n\to \infty }{lim}\frac{1}{n}ln\left(ℙn\left[{S}_{n}={k}_{n}\right]\right)=-{h}_{+}\left(𝜖\right).$

Then

$\underset{n\to \infty }{liminf}\frac{1}{n}ln\left(ℙn\left[{S}_{n}\ge n\left(p+𝜖\right)\right]\right)\ge -{h}_{+}\left(𝜖\right)$

and we will have established the theorem.

We already know from the binomial probability

 $ℙn\left[{S}_{n}={k}_{n}\right]\sim \frac{n!}{{k}_{n}!\left(n-{k}_{n}\right)!}{p}^{{k}_{n}}{\left(1-p\right)}^{n-{k}_{n}}.$ (5)

Now ${k}_{n}$ goes to $+\infty$ as $n\to \infty$. Note that the probability on the left side goes to 0 as $n\to \infty$ by the Weak Law of Large of Numbers. By the Zero Asymptotic Limits Lemma in Asymptotic Limits., the right side must also go to 0 as $n\to \infty$. Since the two sequences are asymptotic, by the Logarithm Lemma in Asymptotic Limits., their logarithms must be asymptotic.

Since ${k}_{n}\sim n\left(p+𝜖\right)$ and $n-{k}_{n}\sim n\left(1-p-𝜖\right)$, then

$\sqrt{\frac{n}{{k}_{n}\left(n-{k}_{n}\right)}}\sim \frac{1}{\sqrt{\left(p+𝜖\right)\left(1-p-𝜖\right)n}}$

so

$\underset{n\to \infty }{lim}\frac{1}{n}ln\frac{1}{\sqrt{2\pi }}\sqrt{\frac{n}{{k}_{n}\left(n-{k}_{n}\right)}}=0.$

Next,

${k}_{n}ln\left(\frac{np}{{k}_{n}}\right)=n\left(p+𝜖\right)ln\left(\frac{p}{p+𝜖}\right)+n\left(p+𝜖\right)ln\left(\frac{n\left(p+𝜖\right)}{{k}_{n}}\right)+\left({k}_{n}-n\left(p+𝜖\right)\right)ln\left(\frac{np}{{k}_{n}}\right).$

Now recall that ${k}_{n}-n\left(p+𝜖\right)$ is bounded by 1, so

 $\underset{n\to \infty }{lim}\frac{1}{n}ln\left({\left(\frac{np}{{k}_{n}}\right)}^{{k}_{n}}\right)=\left(p+𝜖\right)ln\left(\frac{p}{p+𝜖}\right).$ (6)

Using the same basic calculations, we can show

 $\underset{n\to \infty }{lim}\frac{1}{n}ln\left({\left(\frac{n\left(1-p\right)}{n-{k}_{n}}\right)}^{n-{k}_{n}}\right)=\left(1-p-𝜖\right)ln\left(\frac{1-p}{1-p-𝜖}\right).$ (7)

Then in the binomial probability (5) replace the powers with the asymptotic approximations (6) and (7) and the theorem is established. □

#### An Alternative Large Deviation Inequality

Say that a sequence ${a}_{n}=\Theta \left({b}_{n}\right)$ if there are constants ${C}_{l}$ and ${C}_{u}$ such that ${C}_{l}{b}_{n}\le {a}_{n}\le {C}_{u}{b}_{n}$.

Theorem 6. For $p=1∕2$ and $0<\alpha <1∕2$

$ℙ\left[{S}_{n}\le \alpha n\right]=\Theta \left(\frac{1}{\sqrt{n}}\right){2}^{{h}_{2}\left(\alpha \right)-1n}$

where ${h}_{b}\left(x\right)=-x{log}_{b}x-\left(1-x\right){log}_{b}\left(1-x\right)$.

Remark. Note the alternative deﬁnition of the entropy-style function ${h}_{b}\left(x\right)=-x{log}_{b}x-\left(1-x\right){log}_{b}\left(1-x\right)$ including the base-$b$ logarithm.

Remark. By symmetry, $ℙ\left[X\ge \left(1-\alpha \right)n\right]=ℙ\left[X\le \alpha n\right]$, so that this provides an inequality similar to the Large Deviations result above.

Proof.

1. For $1\le k\le \alpha n$,
$\left(\genfrac{}{}{0.0pt}{}{n}{k-1}\right)/\left(\genfrac{}{}{0.0pt}{}{n}{k}\right)=\frac{k}{n-k+1}\le \frac{k}{n-k}\le \frac{\alpha }{1-\alpha }<1.$

2. So
$\left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right)\le \sum _{k=0}^{⌊\alpha n⌋}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right)\le \left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right)\sum _{i\ge 0}{\left(\frac{\alpha }{1-\alpha }\right)}^{i}=\frac{1-\alpha }{1-2\alpha }\left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right)$

The ﬁrst inequality is very crude, essentially

$\left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right)\le \left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right)+\sum _{k=0}^{⌊\alpha n⌋-1}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right).$

The proof principle here is that the terms $\left(\genfrac{}{}{0.0pt}{}{n}{k}\right)$ are relatively dominated by the greatest term $\left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right)$.

3. In particular $\begin{array}{c}ℙ\left[{S}_{n}\le \alpha n\right]=\sum _{k=0}^{⌊\alpha n⌋}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){2}^{-n}\le \frac{1-\alpha }{1-2\alpha }\left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right){2}^{-n}\\ =\Theta \left(\left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right){2}^{-n}\right)=\Theta \left(ℙ\left[{S}_{n}=⌊\alpha n⌋\right]\right).\end{array}$

Again the proof principle is that the probability $ℙ\left[{S}_{n}\le \alpha n\right]$ is essentially dominated by $ℙ\left[{S}_{n}=⌊\alpha n⌋\right]$.

4. Note that $1\le {\left(1+O\left(1∕n\right)\right)}^{n}\le {\left(1+\left(C∕n\right)\right)}^{n}\le {\mathrm{e}}^{C}$, so ${\left(1+O\left(1∕n\right)\right)}^{n}=\Theta \left(1\right)$.
5. By Stirling’s Formula and
$\left(\genfrac{}{}{0.0pt}{}{n}{⌊\alpha n⌋}\right)=\Theta \left(\frac{1}{\sqrt{n}}\right){\left(\frac{1}{{\alpha }^{\alpha }\cdot {\left(1-\alpha \right)}^{1-\alpha }}+O\left(1∕n\right)\right)}^{n}=\Theta \left(\frac{1}{\sqrt{n}}\right){2}^{{h}_{2}\left(\alpha \right)n}.$

6. Dividing by ${2}^{n}$ ﬁnishes the proof.

Remark. Let $X$ be an arbitrary random variable. Then for every $t\in ℝ$

$ℙ\left[X\ge a\right]=ℙ\left[{\mathrm{e}}^{tX}\ge {\mathrm{e}}^{ta}\right]\le \frac{𝔼\left[{\mathrm{e}}^{tX}\right]}{{\mathrm{e}}^{ta}}={\mathrm{e}}^{-ta}{M}_{X}\left(t\right)$

where ${M}_{X}\left(t\right)$ is the moment-generating function of $X$. The inequality follows from Markov’s Inequality. Likewise,

$ℙ\left[X\le a\right]=ℙ\left[-X\ge -a\right]\le {\mathrm{e}}^{ta}{M}_{-X}\left(t\right).$

Remark. If moreover, $X=\sum _{i=1}^{n}{X}_{i}$ with ${X}_{1},\dots {X}_{n}$ independent,

$ℙ\left[X\ge a\right]\le {\mathrm{e}}^{-ta}\prod _{i=1}^{n}𝔼\left[{e}^{t{X}_{i}}\right]={\mathrm{e}}^{-ta}\prod _{i=1}^{n}{M}_{{X}_{i}}\left(t\right).$

Theorem 7 (Chernoﬀ-Hoeﬀding). Let ${S}_{n}=\sum _{i=1}^{n}{X}_{i}$ where ${X}_{i}$ are independent Bernoulli random variables each with success probability ${p}_{i}$, and set $p=\frac{1}{n}\sum _{i=1}^{n}{p}_{i}$, so that $𝔼\left[X\right]=np$. For any $\alpha$ with $p<\alpha <1$,

$ℙ\left[X\ge \alpha p\right]\le {\mathrm{e}}^{-{D}_{KL}\left(\alpha \phantom{\rule{0.3em}{0ex}}\parallel \phantom{\rule{0.3em}{0ex}}p\right)n}$

where ${D}_{KL}\left(x\phantom{\rule{0.3em}{0ex}}\parallel \phantom{\rule{0.3em}{0ex}}y\right)=xlog\frac{x}{y}+\left(1-x\right)log\frac{1-x}{1-y}$. Similarly for $0<\alpha , $ℙ\left[X\le \alpha n\right]\le {\mathrm{e}}^{-{D}_{KL}\left(\alpha \phantom{\rule{0.3em}{0ex}}\parallel \phantom{\rule{0.3em}{0ex}}p\right)n}$

Proof.

1. Let ${q}_{i}=1-{p}_{i}$ and $q=\frac{1}{n}\sum _{i=1}^{n}{q}_{i}=1-p$.
2. Let $f\left(x\right)=log\left(x{\mathrm{e}}^{t}+1-x\right)$. Then $f\left(x\right)$ is concave down since ${f}^{\prime }\left(x\right)=-{\left({\mathrm{e}}^{t}-1\right)}^{2}∕{\left(x{\mathrm{e}}^{t}+1-x\right)}^{2}$.
3. Therefore
$\frac{1}{n}\sum _{i=1}^{n}f\left({p}_{i}\le f\left(p\right)$

by Jensen’s Inequality.

4. $ℙ\left[{S}_{n}\ge \alpha n\right]\le {\mathrm{e}}^{-t\alpha n}\prod _{i=1}𝔼\left[{\mathrm{e}}^{t{X}_{i}}\right]={\mathrm{e}}^{-t\alpha n}\prod _{i=1}𝔼\left[{\mathrm{e}}^{{p}_{i}{\mathrm{e}}^{t}+{q}_{i}}\right]$

since ${\mathrm{e}}^{f\left({p}_{i}\right)}={p}_{i}{\mathrm{e}}^{t}+{q}_{i}$. Then ${\mathrm{e}}^{-t\alpha n}{\prod }_{i=1}\left({p}_{i}{\mathrm{e}}^{t}+{q}_{i}\right)\le {\mathrm{e}}^{-t\alpha n}{\left(p{\mathrm{e}}^{t}+q\right)}^{n}$ since ${\left(p{\mathrm{e}}^{t}+q\right)}^{n}={\mathrm{e}}^{nf\left(p\right)}$.

5. Now let $\beta =1-\alpha$ and take ${\mathrm{e}}^{t}=\frac{\alpha q}{\beta p}$. Then $\begin{array}{c}ℙ\left[{S}_{n}\ge \alpha n\right]\le {\left(\frac{p\beta }{\alpha q}\right)}^{\alpha n}{\left(\frac{\alpha q}{\beta }+q\right)}^{n}=\\ {\left(\frac{p\beta }{\alpha q}\right)}^{\alpha n}{\left(\frac{q}{\beta }\right)}^{\beta n}={\mathrm{e}}^{-{D}_{KL}\left(\alpha \parallel p\right)n}\end{array}$

Corollary 3 (Multiplicative Chernoﬀ bound). Let ${S}_{n}=\sum _{i=1}^{n}{X}_{i}$ where ${X}_{i}$ are independent Bernoulli random variables each with success probability ${p}_{i}$, and set $p=\frac{1}{n}\sum _{i=1}^{n}{p}_{i}$, so that $\mu =𝔼\left[X\right]=np$. For any $𝜖>0$,

$ℙ\left[{S}_{n}\ge \left(1+𝜖\right)\mu \right]\le {\left(\frac{{\mathrm{e}}^{𝜖}}{{\left(1+𝜖\right)}^{1+𝜖}}\right)}^{\mu }\le {\mathrm{e}}^{-{𝜖}^{2}\mu ∕\left(2+𝜖\right)}$

and for $0<𝜖<1,$

$ℙ\left[{S}_{n}\le \left(1-𝜖\right)\mu \right]\le \left(\frac{{\mathrm{e}}^{-𝜖}}{{\left(1-𝜖\right)}^{1-𝜖}}\right)\le {\mathrm{e}}^{-{𝜖}^{2}\mu ∕2}.$

Proof.

1. Set $\alpha =\left(1+𝜖\right)p$ and $\beta =1-\alpha =q-𝜖p\right)$.
2. $\begin{array}{c}ℙ\left[{S}_{n}\ge \alpha n\right]\le {\left(\frac{p\beta }{\alpha q}\right)}^{\alpha n}{\left(\frac{q}{\beta }\right)}^{\beta n}=\frac{1}{{\left(1+𝜖\right)}^{\alpha n}}{\left(1+\frac{𝜖p}{\beta }\right)}^{\beta n}\le \\ \frac{1}{{\left(1+𝜖\right)}^{\alpha n}}{\mathrm{e}}^{𝜖pn}=\left(\frac{{\mathrm{e}}^{𝜖}}{{\left(1+𝜖\right)}^{1+𝜖}}\right)\le {\mathrm{e}}^{-{𝜖}^{2}pn∕\left(2+𝜖\right)}.\end{array}$

The last inequality follows from some analysis.

3. The argument for $ℙ\left[{S}_{n}\le \alpha n\right]$ is similar by setting $\alpha =\left(1-𝜖\right)p$ and $\beta =1-\alpha =q+𝜖p$.

Corollary 4. For $\mu =𝔼\left[{S}_{n}\right]$ and $0<𝜖<1$,

$ℙ\left[\left|X=\mu \right|\ge 𝜖\mu \right]\le 2{\mathrm{e}}^{-{𝜖}^{2}\mu ∕3}$

Remark. The Chernoﬀ bounds are not asymptotic. In particular, they also hold when $𝜖=𝜖\left(n\right)$ and $p=p\left(n\right)$ are functions of $n$.

#### Sources

This section is adapted from: Heads or Tails, by Emmanuel Lesigne, Student Mathematical Library Volume 28, American Mathematical Society, Providence, 2005, Chapter 6, [1].

The alternative theorems and proofs are adapted from class notes by Xavier Perez.

Some history and deﬁnitions are also adapted from the Wikipedia article on Large Deviations Theory.

_______________________________________________________________________________________________

### Problems to Work for Understanding

__________________________________________________________________________

### References

[1]   Emmanuel Lesigne. Heads or Tails: An Introduction to Limit Theorems in Probability, volume 28 of Student Mathematical Library. American Mathematical Society, 2005.

__________________________________________________________________________

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable eﬀort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reﬂects the thoughts, interests and opinions of its author. They do not explicitly represent oﬃcial positions or policies of my employer.

Information on this website is subject to change without notice.