Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar

__________________________________________________________________________

Strong Law of Large Numbers

_______________________________________________________________________

Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See http://mathjax.org for details on supported browsers, accessibility, copy-and-paste, and other features.

_______________________________________________________________________________________________

Study Tip

Study Tip

__________________________________________________________________________

Rating

Rating

Mathematicians Only: prolonged scenes of intense rigor.

_______________________________________________________________________________________________

Section Starter Question

Section Starter Question

Explain what is meant by the “law of averages” and how it applies to an infinite sequence of coin flips.

_______________________________________________________________________________________________

Key Concepts

Key Concepts

  1. Almost surely
    lim nSn(ω) n = p.

  2. Almost surely, the asymptotic proportion of any outcome in an infinite sequence of trials is the probability of that outcome for a single trial.

__________________________________________________________________________

Vocabulary

Vocabulary

  1. Borel’s strong law of large numbers is that almost surely
    lim nSn(ω) n = p.

  2. If b = (b1,b2,,bj) is a word constructed from the alphabet {0, 1} and s = i=1jb j we say that psq1s is the probability of the word b .
  3. The event “Xn is in An infinitely often” denoted Xn An i.o.  is defined as
    Xn An i.o.  = n=1 j=nA j

__________________________________________________________________________

Mathematical Ideas

Mathematical Ideas

Borel’s Strong Law

This section presents three different proofs of Borel’s Strong Law of Large Numbers. The first uses the set theory of negligible sets and the Large Deviations estimate to show that the pointwise convergence fails on a negligible set. The second proof uses a carefully chosen subsequence and elementary estimates. The third proof uses the Borel-Cantelli Lemma and a fourth-moment condition.

Proof with the Large Deviations Estimate

Theorem 1 (Borel’s Strong Law of Large Numbers).   Almost surely

lim nSn(ω) n = p.

Proof. Let Rn(ω) = Sn(ω) n p. The sequence (Rn(ω))n=1 fails to approach 0 if and only if liminf n|Rn(ω)| > 0. That is, (Rn(ω))n=1 fails to approach 0 if and only if there is an m 1 depending on ω such for that for each n 1 there exists a k n satisfying |Rk(ω)| > 1 m. Therefore, the set of points ω in the sample space not satisfying lim nSn(ω) n = p is contained in

m1 n1 knω Ω : |Rk(ω)| > 1 m.

The goal is to show that this set is a negligible event. Since a countable union of negligible sets is negligible, it suffices to show that

Nm = n1 knω Ω : |Rk(ω)| > 1 m

is negligible for each m 1.

For each k 1, let Am,k = ω Ω : |Rk(ω)| > 1 m. By the Large Deviations Estimate, there is a constant c = c(p,m) > 0 such that Am,k eck. Since the series k1eck converges, for every 𝜖 > 0 there is an n 1 such that kneck < 𝜖. Because Nm knAm,k,

Nm knAm,k kn Am,k 𝜖.

This proves that each Nm is a negligible set. □

Proof with subsequences

Here is another proof of Borel’s Strong Law of Large Numbers, adapted from Theorem 1.21, in [1, page 11].

Before the proof, first give a quick definition.

Definition. Suppose A1 A2 . Then lim nAn = n=1A n. Also, if A1 A2 , then lim nAn = n=1A n.

Theorem 2 (Strong Law of Large Numbers).  

lim nSn n p = 0.

Proof. Break the proof into a sequence of simple steps.

  1. The first step is the claim that lim nSn n = p if and only if lim mSm2 m2 = p.

    To see this, first note that the implication from the limit of the full sequence to the subsequential limit along the squares is immediate. For the other direction, for any n, find m so that m2 n < (m + 1)2, or 0 n m2 2m. Then

    Sn n Sm2 m2 = Sn m2 Sm2 m2 + 1 n 1 m2 Sn |n m2| m2 + 1 n 1 m2 n 2m m2 + 2 m = 4 m.

    The term |nm2| m2 at the second line comes from estimating |SnSm2| m2 . Since n m2, the largest the difference could be would be having each ωi = 1 for i > m2, thus the difference could be at most n m2. Then the claim that lim mSm2 m2 = p implies lim nSn n = p follows by the triangle inequality.

  2. For m0 < m1 define
    Em0,m1 = m=m0m1 {ω : Sm2 m2 p > 𝜖}.

    Note that each of these sets depends only on the coordinates ω1,,ωm12, so Em0,m1 is of finite type.

  3. Use the Chebyshev inequality on each set in the union to obtain
    Sm2 m2 p > 𝜖 < 1 𝜖2𝔼 Sm2 m2 p2 = 1 𝜖2 p(1 p) m2 .

  4. Thus Em0,m1 m=m0m1 Sm2 m2 p > 𝜖 = p(1 p) 𝜖2 m=m0m1 1 m2.
  5. Define
    Em0 = lim m1Em0,m1 = m=m0ω : Sm2 m2 1 2 > 𝜖.

    Note that Em0 is the set of ω’s for which Sm2 m2 p > 𝜖 at least once for m m0.

  6. The claim is that
    Em0 = lim m1m1 Em0,m1 1 4𝜖2 m=m0 1 m2.

  7. Note that
    limsup m Sm2 m2 p > 𝜖

    if and only if for any m0 there exists m m0 so that Sm2 m2 p > 𝜖. That is,

    ω : limsup m Sm2 m2 p > 𝜖 Em0,

    for all m0. Notice that Em0 Em0+1 and so consider E𝜖 = lim m0Em0. Again the claim is that

    E𝜖 = lim m0 Em0 .

  8. Then
    E𝜖 lim m0p(1 p) 𝜖2 m=m0 1 m2 = 0.

  9. Let E1k be a special case of E𝜖 for 1 k = 𝜖 and k . Note that
    E1k = ω : limsup m Sm2 m2 p > 1 k.

    Let E = lim kE1k and then note that E = lim k E1k = lim k0 = 0.

Remark. At steps6,7 and9 a limit is pulled through the probability measure. This is valid if given a sequence of finite probability measures n defined on a field of finite type events 0, there exists a well-defined probability measure defined on ,the smallest σ-field containing 0 agreeing with n on 0. That is true from the following theorem, with proof deferred to another section.

Theorem 3 (Kolmogorov Consistency Theorem).   Given a consistent family of finite-dimensional distributions n there exists a unique probability on (Ω, Σ) such that for every n, under the natural projection πn(ω) = (x1,x2,,xn) the induced measure πn1 is identical to n on n.

Proof with the Borel-Cantelli Lemma

Lemma 4 (Borel-Cantelli).

The direct half:
For a sequence of events An, if
n=1 A n <

then

ω : lim n1An ω = 0 = 1.

The converse half:
If the events An are mutually independent, the converse is also true.

Definition.   Let X1,X2,X3,…, be a sequence of random variables and A1,A2,A3,… a sequence of Borel sets in . The event “Xn is in An infinitely often” denoted XnAn i.o.  is defined as

XnAn i.o.  =n=1j=nA j

Remark. Note that the complementary event to the event in the Borel-Cantelli Lemma is

ω : limsup n→∞1An ω = 1 .

This is exactly the event n=1 j=nA j, that is, the event that infinitely many of the events occur. Thus the direct half of the Borel-Cantelli Lemma can be expressed as

 If n=1A n < ∞ then An i.o.  = 0.

The converse half of the Borel-Cantelli Lemma can be expressed as

 If n=1A n = ∞ then An i.o.  = 1.

Proof.

The direct half:
An i.o.  = ℙ n=1 j=nA j = lim n→∞j=nA j lim n→∞j=mA j

But obviously, n=1A n < ∞ implies lim n→∞j=mA j = 0.

The converse half:
This will be proved in the form
 If n=1A n = ∞ then An i.o.  = 1.

The complement of the set {An i.o. } = n=1 j=nA j is n=1 j=nA jC. Then the goal is to show this is a negligible set. It is enough to show that each set j=nA jC is negligible. Temporarily fix nand for each m ≥ n, let Bm be the event Bm = j=nmA jC, so that j=nA jCB m.

Because the events (An)n=1 are independent

Bm = ℙ j=nmA jC =j=nmA nC =j=n(1 − ℙ A n) j=n exp −ℙ A n = exp j=nmA j .

This can be made arbitrarily small by choosing sufficiently large m, so by definition, j=nA jC is negligible.

Theorem 5 (Strong Law of Large Numbers).   If X1,X2,…,Xn is a sequence of independent identically distributed random variables with 𝔼 |Xi|4 = C < ∞, then

lim n→∞Sn n = lim n→∞X1 + X2 + ⋯ + Xn n = 𝔼 X1

with probability 1.

Proof. We can assume without loss of generality that 𝔼 Xi = 0, otherwise just consider Y i = Xi − 𝔼 Xi. Expansion of (X1 + ⋯ + Xn)4, then using the independence and identical distribution assumptions shows

𝔼 Sn4 = n𝔼 X 14 + 3n(n − 1)𝔼 X 12 2 ≤ nC + 3n2σ4.

Then adapting the proof of the Markov and Chebyshev inequalities with fourth moments,

Sn n ≥ δ = ℙ |Sn|4n4δ4 =|Sn|4n4δ4dℙ Ω|Sn|4 n4δ4 dℙ = 𝔼 |Sn|4 n4δ4 nC + 3n2σ4 n4δ4 .

Then from the comparison theorem for series convergence

n=1Sn n ≥ δ < ∞,

and the direct half of the Borel-Cantelli Lemma applies to show that

Sn n ≥ δ i.o.  = 0

or equivalently

lim n→∞Sn n = 0

with probability 1. □

The Strong Law for Pairwise Independent Random Variables

Example. Bernstein’s example of pairwise independence that are not all independent.

Consider flipping a quarter and a dime. Let X1 = 1 if the quarter is heads, X2 = 1 if the dime is heads, and X3 = 1 if the quarter is the same as the dime. First, Xi = 1 2 for i = 1, 2, 3. Next note the following probabilities:

X1 = 1 ∩ X2 = 1 = ℙ X1 = 1X2 = 1 = 1 4, X1 = 1 ∩ X3 = 1 = ℙ X1 = 1X3 = 1 = 1 4, X2 = 1 ∩ X3 = 1 = ℙ X2 = 1X3 = 1 = 1 4,

but

1 4 = ℙ X1 = 1 ∩ X2 = 1 ∩ X3 = 1 ≠ℙ X1 = 1X2 = 1X3 = 1 = 1 8.

Lemma 6. Let (Xn)n≥1 be a sequence of random variables. If n=1X n > 𝜖 converges for all 𝜖 > 0, then lim n→∞Xn = 0 almost surely

Proof.

  1. Set An = [Xn > 𝜖].
  2. Proposition?? implies for that for every 𝜖 > 0 there exists almost surely an n0(ω,𝜖) such that |Xn(ω)|≤ 𝜖 for every n ≥ n0(ω,𝜖).
  3. By considering a countable union of negligible events, we conclude that for every positive integer m there exists almost surely n0(ω,∕m) such that |Xn(ω)|≤ 1∕m for all n ≥ n0(ω, 1∕m).
  4. This implies that the sequence (Xn)n≥1 almost surely converges to 0.

Corollary 1. Let (Xn)n≥1 be a sequence of random variables. If n=0𝔼 |X n| converges, then the sequence (Xn)n≥1 converges almost surely to 0.

Theorem 7 (Pairwise Independent Strong Law). Let (Xn)n≥1 be a sequence of pairwise independent random variables. Suppose that 𝔼 Xn < ∞, and sup n≥1𝔼 Xn2 < ∞. Then for Sn =i=1nX n, we have

lim n→∞Sn n ≥ 𝜖 = 0

for all 𝜖 > 0. In other words, lim n→∞Sn n = 0 almost surely.

Proof. Let M = sup n≥1𝔼 Xn2 and note that pairwise independence says

𝔼 XiXj = 𝔼 Xi 𝔼 Xj = 0,i≠j.

Note that we have

𝔼 Sn n 2 = 1 n2i=1n𝔼 X i2M n .

The Chebyshev inequality gives us that

Sn n ≥ 𝜖𝔼 Sn n 2 𝜖21 𝜖2 M n ,

and so lim n→∞Sn n ≥ 𝜖 = 0. Since n≥1𝔼 Sn n 2 < ∞, by Corollary1 lim n→∞ Sn2 n2 2 = 0 almost surely, and hence lim n→∞Sn2 n2 = 0 almost surely. Now let m = ⌊n; i.e., m2 ≤ n ≤ (m + 1)2. Then lim n→∞Sm2 n = 0 almost surely. This gives

𝔼 Sn nSm2 n 2 = 1 n2𝔼 m2+1nX i 2 = 1 n2i=m2+1n𝔼 X i2 (n − m2)M n2 = 2m + 1 n2 M = 2⌊n⌋ + 1 n2 M = O n−3∕2 .

Using Corollary1 again

lim n→∞ Sn nSm2 n 2 = 0 a.s. lim n→∞Sn nSm2 n = 0a.s. lim n→∞Sn n = 0 a.s.

Applications of the Strong Law

The following proposition says that the asymptotic proportion of any outcome in an infinite sequence of trials is “almost surely” the probability of that outcome for a single trial. This is sometimes referred as the “Law of Averages”.

Proposition 8 (The Law of Averages).   Let (An)n=1 be a sequence of equiprobable independent random events with probability p. The asymptotic empirical probability that these event will occur is almost surely p; that is,

lim n→∞1 nk : 1 ≤ k ≤ n,ω ∈ Ak = p

almost surely.

Proof. For each ω create a sequence ρ = (ρn)n=1 of 0’s and 1’s by setting ρn = 1 if ω ∈ An and ρn = 0 otherwise. The proof is to show

lim n→∞1 nk=1nρ k = ℙ ρi = 1 = p (1)

almost surely.

For each n and each (𝜖1,𝜖2,…,𝜖n) ∈{0, 1}n let s =k=0n𝜖 k, then

ρk = 𝜖k, 1 ≤ k ≤ n = ps(1 − p)n−s.

Now the same proof used to show that 1 nk=1nω k almost surely converges to p can be applied to complete the proof that the convergence of (1) is almost sure. □

Strong Law for Finite Type Events

Corollary 2. Let A be a finite type event. For each integer n ≥ 1 and each ω ∈ Ω, let S(A,n,ω) be the number of integers k between 1 and n such that (ωk,ωk+1,ωk+2,…) ∈ A. Then

lim n→∞S(A,n,ω) n = ℙ A

almost surely.

Proof. Since the event A is of finite type, A depends only on the coordinates with index less than or equal to m for some positive integer m. Equivalently, there exists an AΩ m such that

A = ω : (ω1,ω2,…,ωm) ∈ A.

For each integer j from 1 to m consider the sequence (Aj,n)n≥0 of events defined by

Aj,n = ω : (ωj+nm,ωj+nm+1,ωj+nm+2,…) ∈ A = ω : (ωj+nm,ωj+nm+1,…,ωj+(n+1)m+−1) ∈ A.

For fixed j and varying n, the events Aj,n are independent and each has probability A. Let S(A,j,n,ω) be the number of integers k between 0 and n − 1 such that ω ∈ Aj,k. The Proposition implies that

lim n→∞S(A,j,n,ω) n = ℙ A

almost surely. Also,

1 nmS(A,nm,ω) = 1 mj=1mS(A,j,n,ω) n .

The corollary then follows from the inequality

1 (n + 1)mS(A,nm,ω) ≤ 1 nm + kS(A,nm + k,ω) ≤ 1 nmS(A, (n + 1)m,ω)

which holds for every k from 0 to m. □

Definition. If b = (b1,b2,…,bj) is a word constructed from the alphabet {0, 1} and s =i=1jb j we say that psq1−s is the probability of the word b .

Corollary 3. In the sequence ω, every word b almost surely occurs with asymptotic frequency equal to its probability.

Proof. This is a special case of Proposition 2

Sources

This proof of the Strong Law for sum of Bernoulli random variables is adapted from: Heads or Tails, by Emmanuel Lesigne, Student Mathematical Library Volume 28, American Mathematical Society, Providence, 2005, Chapter 11.2, pages 82-85. [2]. The proof based on subsequential limits and the Borel Cantelli Lemma and following remarks is adapted from Probability by Leo Breiman, Addison-Wesley, Reading MA, 1968, Chapters 1 and 3. [1] The proof using the fourth-moment and the Borel-Cantelli Lemma is adapted from Varadhan, [3].

_______________________________________________________________________________________________

Algorithms, Scripts, Simulations

Algorithms, Scripts, Simulations

Algorithm

́Comment Post: Observation that any coin flip sequence
selected by a pseudo-random-number-generator
has the average number of heads converge to the probability of heads,
suggesting the Strong Law of Large Numbers.
́Comment Post: The average number of heads sampled at an exponential sequence of indices, to suggest the Strong Law of Large Numbers.
1́Set probability of success p.
2́Set length of coin-flip sequence n.
3́Initialize and fill length n array of coin flips.
4́Use vectorization to sum the cumulative sequence Sn of heads.
5́Use vectorization to compute an exponential sequence of indices for sampling.
6́Use vectorization to compute the average number of heads along the exponential sequence of indices.
7́Print the averages.

Scripts

Scripts

R

R script for Strong Law..

1p <- 0.5 
2n <- 1e+6 
3coinFlips <- (runif(n) <= p) 
4S <- cumsum(coinFlips) 
5 
6tenpows <- 10^(1:6) 
7cat(S[tenpows]/tenpows, "\n")
0cAp0x1-180008:
Octave

Octave script for Strong Law..

1p = 0.5; 
2n = 1e+6; 
3coinFlips = rand(1, n) <= p; 
4S = cumsum(coinFlips); 
5 
6tenpows = 10 .^ (1:6); 
7S(tenpows) ./ tenpows
0cAp1x1-180008:
Perl

Perl PDL script for Strong Law..

1$p         = 0.5; 
2$n         = 1e+6; 
3$coinFlips = random($n) <= $p; 
4$S         = cumusumover($coinFlips); 
5 
6$pows = zeros(6)->xlinvals( 1, 6 ); 
7$tenpows = 10**$pows; 
8 
9print $S( $tenpows - 1 ) / $tenpows, "\n";
0cAp2x1-1800010:
SciPy

Scientific Python script for Strong Law..

1import scipy 
2 
3p = 0.5 
4n = 1e+6 
5coinFlips = scipy.random.random(n) <= p 
6S = scipy.cumsum(coinFlips, axis=0) 
7 
8pows = scipy.arange(1, 6+1)    # 6 entries 
9tenpows = 10**pows 
10 
11print( scipy.take(S,tenpows-1)/scipy.array(tenpows,dtype=float) )
0cAp3x1-1800012:

__________________________________________________________________________

Problems to Work

Problems to Work for Understanding

__________________________________________________________________________

Books

Reading Suggestion:

References

[1]   Leo Breiman. Probability. Addison Wesley, 1968.

[2]   Emmanuel Lesigne. Heads or Tails: An Introduction to Limit Theorems in Probability, volume 28 of Student Mathematical Library. American Mathematical Society, 2005.

[3]   S. R. S. Varadhan. Probability Theory. Number 7 in Courant Lecture Notes. American Mathematical Society, 2001.

__________________________________________________________________________

Links

Outside Readings and Links:

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1

Email to Steve Dunbar, sdunbar1 at unl dot edu

Last modified: Processed from LATEX source on October 20, 2017