Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar

__________________________________________________________________________

The Moderate Deviations Result

_______________________________________________________________________

Note: To read these pages properly, you will need the latest version of the Mozilla Firefox browser, with the STIX fonts installed. In a few sections, you will also need the latest Java plug-in, and JavaScript must be enabled. If you use a browser other than Firefox, you should be able to access the pages and run the applets. However, mathematical expressions will probably not display correctly. Firefox is currently the only browser that supports all of the open standards.

_______________________________________________________________________________________________

Rating

Rating

Mathematicians Only: prolonged scenes of intense rigor.

_______________________________________________________________________________________________

Section Starter Question

Section Starter Question

__________________________________________________________________________

Key Concepts

Key Concepts

  1. For any sequence an with n an n we have
    n Sn pn an 0

    but neither the Central Limit Theorem nor the Large Deviations Principle tells us how fast the convergence is, nor what the precise rate of growth is for an. Making this precise is the domain of Moderate Deviations Theorem.

  2. Precisely, if an , and lim n an n16 = 0 then
    n Sn n p p(1 p) an n 1 an2πean22.

__________________________________________________________________________

Vocabulary

Vocabulary

  1. Moderate deviations results are refinements of the Central Limit Theorem
    n Sn n p p(1 p) an n 1 an2πean22.

    when an = o(n16).

__________________________________________________________________________

Mathematical Ideas

Mathematical Ideas

Recall that Xk is a Bernoulli random variable taking on the value 1 or 0 with probability p or 1 p respectively. Then

Sn = k=1nX i

is a binomial random variable indicating the number of successes in a composite experiment.

The Large Deviations Estimate shows that the probability of large deviation events of the type

n Sn n p > x

i.e. the sample mean exceeds the mean by more than x, decays exponentially in x. Equivalently, the probability n Sn np nx that the partial sum Sn exceeds its mean by more than nx is exponentially small in x. The de Moivre-Laplace Central Limit Theorem tells us the probability that the partial sum exceeds its average by an order of n. Precisely

n Sn n p x n 1 Φ(x) > 0.

Equivalently, the probability n Sn np nx for partial sums approaches the standard normal distribution. This implies that for any sequence an with n an n we still have

n Sn pn an 0

and neither the Central Limit Theorem nor the Large Deviations Estimate tells us how fast the convergence is, nor what the precise rate of growth is for an. Making this precise is the domain of Moderate Deviations Theorem.

The Moderate Deviations Theorem is due to Harald Cramér in 1938.

First we have to prove two supplementary results, each of which is interesting in its own right.

Proposition 1 (“Optimization” extension of de Moivre-Laplace Central Limit Theorem). Assume

  1. For 0 k n, define δn(k) by
    n kpk(1 p)k = 1 2π p(1 p)ne(knp)2 2np(1p) (1 + δn(k)).

  2. Let cn be a positive real sequence with lim ncn = 0.
  3. Let In = {k : |k np| < c nn23}.

Then

lim n max kIn|δn(k)| = 0.


PIC

Figure 1: Comparison of the binomial distribution with n = 12, p = 410 with the normal distribution with mean np and variance np(1 p).

Remark. In Figure 1 the amount δn(k) is the small relative error between the height of the normal distribution curve and the height of the binomial distribution histogram over the integer k.

Remark. Compare the statement of this proposition to the statement of the de Moivre-Laplace Binomial Point Mass Limit, Lemma 9 in de Moivre Laplace Central Limit Theorem.. Here the domain of the maximum is In = {k : |k np| < c nn23} which is slightly larger than the domain in the de Moivre-Laplace Binomial Point Mass Limit, In = {k : |k np| < an}.

Proof.

  1. Recall from the de Moivre-Laplace Theorem (see step 2 of the proof of Lemma 9 in de Moivre Laplace Central Limit Theorem.) that from Stirling’s Formula n kpk(1 p)nk = n! k!(n k)!pk(1 p)nk = 1 2π n k(n k) np k k n(1 p) (n k) nk 1 + ϵn (1 + ϵk)(1 + ϵnk) .

    where ϵn < An, ϵk < Ak, ϵn=k < A(n k) for some constant A.

  2. For k In n (np + cnn23)(n(1 p) + cnn23) n k(n k) n (np cnn23)(n(1 p) cnn23),

    1 n 1 (p + cnn13)((1 p) + cnn13) n k(n k) 1 n 1 (p cnn13)((1 p) cnn13),

    1 np(1 p) 1 1 + cnn13 p 1 + cnn13 1p n k(n k) 1 np(1 p) 1 1 cnn13 p 1 cnn13 1p .

    Compare this to step 3 of the proof of Lemma 9 in de Moivre Laplace Central Limit Theorem..

  3. Therefore, for k In n k(n k) = 1 np(1 p) 1 +  Ou(cnn13) n k(n k) = 1 np(1 p) 1 +  Ou(cnn13) .  (1)

    This follows from the One-Term Geometric Series Expansion and the Square-Root Expansion Proposition in the section Big-Oh Algebra.

    Compare this to steps 4,5 of the proof of Lemma 9 in de Moivre Laplace Central Limit Theorem..

  4. Since k In, knp k =  Ou(cnn13) and knp nk =  Ou(cnn13). Compare this to step 7 of the proof of Lemma 9 in de Moivre Laplace Central Limit Theorem..
  5. Using the Taylor series expansion for the logarithm ln n kpk n n k(1 p) nk = 1 2(k np)2 1 k + 1 n k + k Ou(cn3n1) + (n k) O u(cn3n1) = 1 2(k np)2 1 np(1 p) +  Ou(cn3).

    Compare this to step 8 of the proof of Lemma 9 in de Moivre Laplace Central Limit Theorem..

  6. Thus
    np k k n(1 p) n k nk = exp (k np)2 2np(1 p) 1 +  Ou(cn3) . (2)

    See the Exponential Expansion Proposition in the section Big-Oh Algebra.

  7. Step 10 of the proof of Lemma 9 in de Moivre Laplace Central Limit Theorem. showed why
    1 + ϵn (1 + ϵk)(1 + ϵnk) = 1 +  Ou 1 n. (3)
  8. Now combining equations (1), (2) and (3) above into step 1, we get
    n kpk(1 p)nk = 1 2π p(1 p)ne(knp)2 2np(1p) 1 +  Ou(cn) ,

    where cn = max c nn13,c n3,n1.

Proposition 2. Assume

  1. kn and n are two sequences with kn < n for all n.
  2. kn = np + o(n23) and n = np + o(n23); i.e., kn = np + cnn23 where cn0 as n and n = np + cnn23 where cn 0 as n .
  3. Let an = knnp np(1p) and bn = nnp np(1p).

Then

n kn Sn n 1 2πanbn ex22 dx,

as n .

Remark. If (an) and (bn) converge respectively to a and b such that a < b then this proposition becomes the de Moivre-Laplace Central Limit Theorem.

Proof.

  1. Take n so large that 0 kn < n n.
  2. Let h(n) = 1 np(1p). Then by the Optimization Proposition 1
    n Sn = j = h(n) 2π exp (j np)2 2np(1 p) (1 + δn(j)).

    and

    n kn Sn < n = h(n) 2π j=knn1 exp (j np)2 2np(1 p) (1 + δn(j)). (4)

    The hypotheses on the sequences (kn) and (n) along with Proposition 1 imply that the sequence (δn(j)) converges uniformly to zero when kn j n.

  3. Therefore it suffices to show that
    h(n) j=knn1 exp (j np)2 2np(1 p) anbn ex22 dx.

  4. Set
    x(j) = j np np(1 p).

    Then an = x(kn) and bn = x(n).

  5. The claim is:
    h(n) j=knn1 exp (j np)2 2np(1 p) anbn ex22 dx = o anbn ex22 dx. (5)

    This claim will follow by considering the Riemann sums for the integral of ex22.

  6. In the case (an) > 0, h(n) exp x(j + 1)2 2 <x(j)x(j+1) exp(x22) dx < h(n) exp x(j)2 2 .

    For kn j n obtain

    0 h(n) j=knn1 exp x(j)2 2 anbn exp x22 dx h(n)(exp(an22) exp(b n22))(6)

  7. Also anbn exp x22 dx 1 bnanbn x exp x22 dx = 1 bn exp(an22) exp(b n22) .(7)

  8. Note h(n) = o(bn1) since bn = o(n16). Then combining (6) and (7) yields (5)

Theorem 3 (Moderate Deviations Theorem). Suppose

  1. (an) is a sequence of real numbers,
  2. an as n and
  3. lim n an n16 = 0.

Then

n Sn n p p(1 p) an n 1 an2πean22.

Remark. Step 6 of the proof of the Moderate Deviations Theorem shows that

1 2πanbn ex22 dx 1 an2πean22.

so that an equivalent result is that

n Sn n p p(1 p) an n 1 an2πean22.

Remark. The de Moivre-Laplace Central Limit Theorem tells us that as n

n Sn n p p(1 p) a n Φ(a) = 1 2πaex2 2 dx

The moderate deviations result tells us that this estimate remains true when a is allowed to approach at a slow enough rate.

Proof.

  1. The hypothesis 3 says that an less quickly than n16.
  2. Let dn = an. Then lim ndn an = 0 and so dn = o(an).
  3. Let kn = np + np(1 p)an and n = np + np(1 p)(an + dn). A schematic diagram of where all the sequences sit relative to each other is below:

    PICT

  4. Event [Sn kn] = Sn n p kn n p = Sn n p p(1p) n an. Thus, n Sn n p p(1 p) n an = n Sn kn = n kn Sn < n + n Sn n .

    Step 5 below will take care of the first summand. Step 7 below will take care of the second summand.

  5. By hypothesis 3, an = o(n16) and so kn,n = np + o(n23). From Proposition 3, set
    an = kn np np(1 p),bn = n np np(1 p),

    and so

    n kn Sn < n 1 2πanbn ex22 dx.

    This allows us to say that

    n kn Sn < n 1 2πanbn ex22 dx 1 2πanan ex22 dx.

    Step 6 below will take care of the first summand. Step 7 below will take care of the second summand.

  6. The claim is that
    anbn ex22 dx 1 anean22.

    1. Notice that
      anbn ex22 dx 1 ananxex22 dx = 1 anean22.

    2. We also have that bn an + dn, since normalizing the ceiling is at least as big as normalizing the argument of the ceiling. Thus, anbn ex22 dx anan+dn ex22 dx 1 an + dnanan+dn xex22 dx = 1 an + dn exp an2 2 exp (an + dn)2 2 .

      Divide by 1 an exp an2 2 to get on the right hand side:

      = an an + dn an an + dn exp (an+dn)2 2 exp an2 2 1 0 = 1.

    Now combine steps 6a and6b to get the claim of step 6.

  7. The claim is that
    anan ex22 dx = o 1 anean22 .

    The fact that 0 an a n (np(1 p))12 directly implies that

    anan ex22 dx 1 np(1 p) exp an2 2

    by approximating the integral with a 1-box left or lower Riemann sum. Divide through by 1 an exp an2 2 .

    ananex22 dx 1 an exp an2 2 < an np(1 p) 0,

    since an = o(n16).

  8. The claim is that
    n Sn n = o 1 anean22 .

    1. By the Large Deviations Theorem we have
      n Sn n exp nh+ p(1 p) bn n,

      where h+(ϵ) = ϵ2 2p(1p) +  O(ϵ3) for ϵ 0. Thus,

      n Sn n exp bn2 2 +  O bn3 n exp bn2 2 ,

      since bn = o(n12).

    2. Notice that exp bn2 2 exp (an + dn)2 2 = o exp dn2 2 exp an2 2 .
    3. We can see that exp dn2 2 1 an by our choice of dn. (Note that dn > 2 ln an.)

    This concludes step 8.

Example. Take an = n18, so that an and lim n an n16 = lim nn124 = 0. Take p = 12. Take n = 104, so a104 = 10 and

n S104 104 1 2 1 2 10 104 = n S104 5000 + 5010 .

R  1-pbinom(5000+50*sqrt(10)-1, 10^4,0.5)  0.0008156979 (1/(sqrt(10*2*pi)))*exp(-(sqrt(10))^2/2 0.0008500367
Octave 1-binocdf(5000+50*sqrt(10)-1, 10^4, 0.5)  8.1570e-04  (1/(sqrt(10*2*pi))*exp(-sqrt(10)^2/2)  8.5004e-04

Using the deMoivre-Laplace Central Limit Theorem in R:  1- pnorm(5000+50*sqrt (10), mean=5000, sd=50)  gives 0.0007827011.

Sources

The explanatory remarks at the beginning comparing the Moderate Deviations Theorem to the Large Deviations Theorem and the Central Limit Theorem are from the survey article by Mörters. This section is adapted from: Heads or Tails, by Emmanuel Lesigne, Student Mathematical Library Volume 28, American Mathematical Society, Providence, 2005, Chapter 8. [1]. [2].

_______________________________________________________________________________________________

Algorithms, Scripts, Simulations

Algorithms, Scripts, Simulations

Algorithm

The experiment is flipping a coin n times, and repeat the experiment k times. Then check the probability of moderate deviations.

Scripts

Scripts

R

R script for Moderate Deviations.

    p < 0.5 
    n < 10000 
    k < 1000 
    coinFlips < array( 0+(runif(nk) <= p), dim=c(n,k)) 
         # 0+ coerces Boolean to numeric 
    headsTotal < colSums(coinFlips)   # 0..n binomial rv sample, size k 
 
    an < nˆ(1/8) 
    mu < p
    stddev < sqrt(p(1p)n) 
    moddev < mu + stddev(an) 
    prob < sum( 0+(headsTotal > moddev) )/
    theoretical < ( 1/(sqrt(2pi)an) )exp(an)ˆ2/2 ) 
    cat(sprintf(”Empirical_probability:_%f_n”, prob )) 
    cat(sprintf(”Moderate_Deviations_Theorem_estimate:_%f_n”, theoretical))
Octave

Octave script for Moderate Deviations.

p = 0.5; 
n = 10000; 
k = 1000; 
 
coinFlips = rand(n,k) <= p; 
headsTotal = sum(coinFlips);  # 0..n binomial rv sample, size k 
 
an = nˆ(1/8); 
mu = pn; 
stddev = sqrt(p(1p)n); 
moddev = mu + stddevan; 
prob = sum( headsTotal > moddev)/k; 
theoretical = ( 1/(sqrt(2pi)an) )exp(an)ˆ2/2 ); 
disp(”Empirical_probability:”), disp( prob ) 
disp(”Moderate_Deviations_Theorem_estimate:”), disp( theoretical )
Perl

Perl PDL script for Moderate Deviations.

use PDL::NiceSlice; 
use PDL::Constants qw(PI); 
 
$p = 0.5; 
$n = 10000; 
$k = 1000; 
 
$coinFlips = random( $k, $n ) <= $p;    #note order of dims!! 
$headsTotal = 
    $coinFlips>transpose>sumover;     # 0..n binomial r.v. sample, size k 
 
#note transpose, PDL likes x (row) direction for implicitly threaded operations 
 
$an     = $n∗∗( 1 / 8 ); 
$mu     = $p  $n; 
$stddev = sqrt( $p  ( 1  $p )  $n ); 
$moddev = $mu + $stddev  $an; 
 
$prob = ( ( $headsTotal > $moddev )>sumover ) / $k; 
$theoretical = ( 1 / ( sqrt( 2  PI )  $an ) )  exp( $an∗∗2 ) / 2 ); 
 
print ”Empirical_probability:_”,               $prob,        ”n”; 
print ”Moderate_Deviations_Theorem_estimate:”, $theoretical, ”n”;
SciPy

Scientific Python script for Moderate Deviations.

import scipy 
 
p = 0.5 
n = 10000 
k = 1000 
 
coinFlips = scipy.random.random((n,k))<= p 
# Note Booleans True for Heads and False for Tails 
headsTotal = scipy.sum(coinFlips, axis = 0) # 0..n binomial r.v. sample, size k 
# Note how Booleans act as 0 (False) and 1 (True) 
 
an = n∗∗(1./8.) 
mu = p  n 
stddev = scipy.sqrt( p  ( 1p ) n ) 
moddev = mu + stddev  an 
 
prob = (scipy.sum( headsTotal  > moddev)).astype(’float’)/k 
# Note the casting of integer type to float to get float 
theoretical = ( 1/(scipy.sqrt(2scipy.pi)an))scipy.exp((an∗∗2)/2) 
 
print ”Empirical_probability:_”, prob 
print ”Moderate_Deviations_Theorem_estimate:”, theoretical

__________________________________________________________________________

Problems to Work

Problems to Work for Understanding

  1. Using the Proposition 1 show that if (an) and (bn) converge respectively to a and b such that a < b then this proposition becomes the de Moivre-Laplace Central Limit Theorem.
  2. Explain why
    n Sn n exp bn2 2 +  O bn3 n exp bn2 2 ,

    if bn = o(n12).

  3. Explain why for k In, knp k =  Ou(cnn13) and knp nk =  Ou(cnn13).
  4. Explain why
    exp (an+dn)2 2 exp an2 2 0.

  5. Explain why
    exp (an + dn)2 2 = o exp dn2 2 exp an2 2 .

__________________________________________________________________________

Books

Reading Suggestion:

References

[1]   Emmanuel Lesigne. Heads or Tails: An Introduction to Limit Theorems in Probability, volume 28 of Student Mathematical Library. American Mathematical Society, 2005.

[2]   Peter Mörters. Large deivation theory and applications. http://people.bath.ac.uk/maspm/LDP.pdf, November 2008. Cramér’s theorem, large deviations, moderate deviations.

__________________________________________________________________________

Links

Outside Readings and Links:

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1

Email to Steve Dunbar, sdunbar1 at unl dot edu

Last modified: Processed from LATEX source on November 29, 2012