Steven R. Dunbar
Department of Mathematics
203 Avery Hall
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Stochastic Processes and

__________________________________________________________________________

Laws of Large Numbers

_______________________________________________________________________

Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See http://mathjax.org for details on supported browsers, accessibility, copy-and-paste, and other features.

_______________________________________________________________________________________________ ### Rating

Mathematically Mature: may contain mathematics beyond calculus with proofs.

_______________________________________________________________________________________________ ### Section Starter Question

Consider a fair ($p=1∕2=q$) coin tossing game carried out for 1000 tosses. Explain in a sentence what the “law of averages” says about the outcomes of this game.

_______________________________________________________________________________________________ ### Key Concepts

1. The precise statement, meaning and proof of the Weak Law of Large Numbers.
2. The precise statement and meaning of the Strong Law of Large Numbers.

__________________________________________________________________________ ### Vocabulary

1. The Weak Law of Large Numbers is a precise mathematical statement of what is usually loosely referred to as the “law of averages”. Precisely, let ${X}_{1},\dots ,{X}_{n}$ be independent, identically distributed random variables each with mean $\mu$ and variance ${\sigma }^{2}$. Let ${S}_{n}={X}_{1}+\cdots +{X}_{n}$ and consider the sample mean or more loosely, the “average” ${S}_{n}∕n$. Then the Weak Law of Large Numbers says that the sample mean ${S}_{n}∕n$ converges in probability to the population mean $\mu$. That is:
$\underset{n\to \infty }{lim}{ℙ}_{n}\left[|{S}_{n}∕n-\mu |>𝜖\right]=0.$

In words, the proportion of those samples whose sample mean diﬀers signiﬁcantly from the population mean diminishes to zero as the sample size increases.

2. The Strong Law of Large Numbers says that ${S}_{n}∕n$ converges to $\mu$ with probability $1$. That is:
$ℙ\left[\underset{n\to \infty }{lim}{S}_{n}∕n=\mu \right]=1.$

In words, the Strong Law of Large Numbers “almost every” sample mean approaches the population mean as the sample size increases.

__________________________________________________________________________ ### Mathematical Ideas

#### The Weak Law of Large Numbers

Lemma 1 (Markov’s Inequality). If $X$ is a random variable that takes only nonnegative values, then for any $a>0$:

$ℙ\left[X\ge a\right]\le 𝔼\left[X\right]∕a$

Proof. Here is a proof for the case where $X$ is a continuous random variable with probability density $f$:

$\begin{array}{llll}\hfill 𝔼\left[X\right]& ={\int }_{0}^{\infty }xf\left(x\right)\phantom{\rule{0.3em}{0ex}}dx\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & ={\int }_{0}^{a}xf\left(x\right)\phantom{\rule{0.3em}{0ex}}dx+{\int }_{a}^{\infty }xf\left(x\right)\phantom{\rule{0.3em}{0ex}}dx\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \ge {\int }_{a}^{\infty }xf\left(x\right)\phantom{\rule{0.3em}{0ex}}dx\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \ge {\int }_{a}^{\infty }af\left(x\right)\phantom{\rule{0.3em}{0ex}}dx\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =a{\int }_{a}^{\infty }f\left(x\right)\phantom{\rule{0.3em}{0ex}}dx\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =aℙ\left[X\ge a\right].\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

(The proof for the case where $X$ is a purely discrete random variable is similar with summations replacing integrals. The proof for the general case is exactly as given with $dF\left(x\right)$ replacing $f\left(x\right)\phantom{\rule{3.26288pt}{0ex}}dx$ and interpreting the integrals as Riemann-Stieltjes integrals.) □

Lemma 2 (Chebyshev’s Inequality). If $X$ is a random variable with ﬁnite mean $\mu$ and variance ${\sigma }^{2}$, then for any value $k>0$:

$ℙ\left[|X-\mu |\ge k\right]\le {\sigma }^{2}∕{k}^{2}.$

Proof. Since ${\left(X-\mu \right)}^{2}$ is a nonnegative random variable, we can apply Markov’s inequality (with $a={k}^{2}$) to obtain

$ℙ\left[{\left(X-\mu \right)}^{2}\ge {k}^{2}\right]\le 𝔼\left[{\left(X-\mu \right)}^{2}\right]∕{k}^{2}.$

But since ${\left(X-\mu \right)}^{2}\ge {k}^{2}$ if and only if $|X-\mu |\ge k$, the inequality above is equivalent to:

$ℙ\left[|X-\mu |\ge k\right]\le {\sigma }^{2}∕{k}^{2}$

and the proof is complete. □

Theorem 3 (Weak Law of Large Numbers). Let ${X}_{1},{X}_{2},{X}_{3},\dots ,$ be independent, identically distributed random variables each with mean $\mu$ and variance ${\sigma }^{2}$. Let ${S}_{n}={X}_{1}+\cdots +{X}_{n}$. Then ${S}_{n}∕n$ converges in probability to $\mu$. That is:

$\underset{n\to \infty }{lim}{ℙ}_{n}\left[|{S}_{n}∕n-\mu |>𝜖\right]=0.$

Proof. Since the mean of a sum of random variables is the sum of the means, and scalars factor out of expectations:

$𝔼\left[{S}_{n}∕n\right]=\left(1∕n\right)\sum _{i=1}^{n}𝔼\left[{X}_{i}\right]=\left(1∕n\right)\left(n\mu \right)=\mu .$

Since the variance of a sum of independent random variables is the sum of the variances, and scalars factor out of variances as squares:

$Var\left[{S}_{n}∕n\right]=\left(1∕{n}^{2}\right)\sum _{i=1}^{n}Var\left[{X}_{i}\right]=\left(1∕{n}^{2}\right)\left(n{\sigma }^{2}\right)={\sigma }^{2}∕n.$

Fix a value $𝜖>0$. Then using elementary deﬁnitions for probability measure and Chebyshev’s Inequality:

$0\le {ℙ}_{n}\left[|{S}_{n}∕n-\mu |>𝜖\right]\le {ℙ}_{n}\left[|{S}_{n}∕n-\mu |\ge 𝜖\right]\le {\sigma }^{2}∕\left(n{𝜖}^{2}\right).$

Then by the squeeze theorem for limits

$\underset{n\to \infty }{lim}{ℙ}_{n}\left[|{S}_{n}∕n-\mu |>𝜖\right]=0.$

Jacob Bernoulli originally proved the Weak Law of Large Numbers in 1713 for the special case when the ${X}_{i}$ are binomial random variables. Bernoulli had to create an ingenious proof to establish the result, since Chebyshev’s inequality was not known at the time. The theorem then became known as Bernoulli’s Theorem. Simeon Poisson proved a generalization of Bernoulli’s binomial Weak Law and ﬁrst called it the Law of Large Numbers. In 1929 the Russian mathematician Aleksandr Khinchin proved the general form of the Weak Law of Large Numbers presented here. Many other versions of the Weak Law are known, with hypotheses that do not require such stringent requirements as being identically distributed, and having ﬁnite variance.

#### The Strong Law of Large Numbers

Theorem 4 (Strong Law of Large Numbers). Let ${X}_{1},{X}_{2},{X}_{3},\dots ,$ be independent, identically distributed random variables each with mean $\mu$ and variance $𝔼\left[{X}_{j}^{2}\right]<\infty$. Let ${S}_{n}={X}_{1}+\cdots +{X}_{n}$. Then ${S}_{n}∕n$ converges with probability $1$ to $\mu$,

$ℙ\left[\underset{n\to \infty }{lim}\frac{{S}_{n}}{n}=\mu \right]=1.$

The proof of this theorem is beautiful and deep, but would take us too far aﬁeld to prove it. The Russian mathematician Andrey Kolmogorov proved the Strong Law in the generality stated here, culminating a long series of investigations through the ﬁrst half of the 20th century.

#### Discussion of the Weak and Strong Laws of Large Numbers

In probability theory a theorem that tells us how a sequence of probabilities converges is called a weak law. For coin tossing, the sequence of probabilities is the sequence of binomial probabilities associated with the ﬁrst $n$ tosses. The Weak Law of Large Numbers says that if we take $n$ large enough, then the binomial probability of the mean over the ﬁrst $n$ tosses diﬀering “much” from the theoretical mean should be small. This is what is usually popularly referred to as the law of averages. However, this is a limit statement and the Weak law of Large Numbers above does not indicate the rate of convergence, nor the dependence of the rate of convergence on the diﬀerence $𝜖$. Note furthermore that the Weak Law of Large Numbers in no way justiﬁes the false notion called the “Gambler’s Fallacy”, namely that a long string of successive Heads indicates a Tail “is due to occur soon”. The independence of the random variables completely eliminates that sort of prescience.

A strong law tells how the sequence of random variables as a sample path behaves in the limit. That is, among the inﬁnitely many sequences (or paths) of coin tosses we select one “at random” and then evaluate the sequence of means along that path. The Strong Law of Large Numbers says that with probability $1$ that sequence of means along that path will converge to the theoretical mean. The formulation of the notion of probability on an inﬁnite (in fact an uncountably inﬁnite) sample space requires mathematics beyond the scope of the course, partially accounting for the lack of a proof for the Strong Law here.

Note carefully the diﬀerence between the Weak Law of Large Numbers and the Strong Law. We do not simply move the limit inside the probability. These two results express diﬀerent limits. The Weak Law is a statement that the group of ﬁnite-length experiments whose sample mean is close to the population mean approaches all of the possible experiments as the length increases. The Strong Law is an experiment-by-experiment statement, it says (almost every) sequence has a sample mean that approaches the population mean. This is reﬂected in the subtle diﬀerence in notation here. In the Weak Law the probabilities are written with a subscript: ${ℙ}_{n}\left[\cdot \right]$ indicating this is a binomial probability distribution with parameter $n$ (and $p$). In the Strong Law, the probability is written without a subscript, indicating this is a probability measure on a sample space. Weak laws are usually much easier to prove than strong laws.

#### Sources

This section is adapted from Chapter 8, “Limit Theorems”, A First Course in Probability, by Sheldon Ross, Macmillan, 1976. ### Algorithms, Scripts, Simulations

#### Algorithm

The experiment is ﬂipping a coin $n$ times, and repeat the experiment $k$ times. Then compute the proportion for which the sample mean deviates from $p$ by more than $𝜖$.

#### Scripts

Geogebra
+
R
1p <- 0.5
2n <- 10000
3k <- 1000
4coinFlips <- array( 0+(runif(n*k) <= p), dim=c(n,k))
5# 0+ coerces Boolean to numeric
7# 0..n binomial rv sample, size k
8
9epsilon <- 0.01
10mu <- p
11prob <- sum( 0+(abs( headsTotal/n - mu ) > epsilon) )/k
12cat(sprintf("Empirical probability: %f \n", prob ))
Octave
1p = 0.5;
2n = 10000;
3k = 1000;
4
5coinFlips = rand(n,k) <= p;
7# 0..n binomial rv sample, size k
8
9epsilon = 0.01
10mu = p;
11prob = sum( abs( headsTotal/n - mu ) > epsilon)/k;
12disp("Empirical probability:"), disp( prob )
Perl
1use PDL::NiceSlice;
2
3$p = 0.5; 4$n = 10000;
5$k = 1000; 6 7$coinFlips = random( $k,$n ) <= $p; 8 9#note order of dims!! 10$headsTotal = $coinFlips->transpose->sumover; 11 12# 0..n binomial r.v. sample, size k 13# note transpose, PDL likes x (row) direction for 14# implicitly threaded operations 15 16$epsilon = 0.01;
17$mu =$p;
18
19$prob = ( ( abs( ($headsTotal / $n ) -$mu ) > $epsilon )->sumover ) /$k;
20
21print "Empirical probability: ", \$prob, "\n";
SciPy
1import scipy
2
3p = 0.5
4n = 10000
5k = 1000
6
7coinFlips = scipy.random.random((n,k))<= p
8# Note Booleans True for Heads and False for Tails
9headsTotal = scipy.sum(coinFlips, axis = 0)
10# 0..n binomial r.v. sample, size k
11# Note how Booleans act as 0 (False) and 1 (True)
12
13epsilon = 0.01
14mu = p
15
16prob = (scipy.sum( abs( headsTotal.astype(float)/n - mu)  >  epsilon)).astype(float)/k
17# Note the casting of integer types to float to get floats
18
19print "Empirical probability: ", prob

__________________________________________________________________________ ### Problems to Work for Understanding

1. Suppose $X$ is a continuous random variable with mean and variance both equal to $20$. What can be said about $ℙ\left[0\le X\le 40\right]$?
2. Suppose X is an exponentially distributed random variable with mean $𝔼\left[X\right]=1$. For $x=0.5$, $1$, and $2$, compare $ℙ\left[X\ge x\right]$ with the Markov Inequality bound.
3. Suppose $X$ is a Bernoulli random variable with $ℙ\left[X=1\right]=p$ and $ℙ\left[X=0\right]=1-p=q$. Compare $ℙ\left[X\ge 1\right]$ with the Markov Inequality bound.
4. Make a sequence of $100$ coin tosses and keep a record as in Experiment.. How “typical” was your coin ﬂip sequence? All ${2}^{100}$ coin ﬂip sequences are equally likely of course, so yours is neither more nor less typical than any other in that way. However, some sets or events of coin ﬂip sequences are more or less “typical” when measured by the probability of a corresponding event. What is your value of ${S}_{100}$, and the number of heads and the number of tails in your record? Using your value of ${S}_{100}$ let $𝜖=|{S}_{100}|$ and use Chebyshev’s Inequality as in the proof of the Weak Law of Large Numbers to provide an upper bound on the probability that for all possible records $|{S}_{100}|>𝜖$.
5. Let ${X}_{1},{X}_{2},\dots ,{X}_{10}$ be independent Poisson random variables with mean $1$. First use the Markov Inequality to get a bound on
$ℙ\left[{X}_{1}+\cdots +{X}_{10}>15\right]$

. Next ﬁnd the exact probability that $ℙ\left[{X}_{1}+\cdots +{X}_{10}>15\right]$ using that the fact that the sum of independent Poisson random variables with parameters ${\lambda }_{1}$, ${\lambda }_{2}$ is again Poisson with parameter ${\lambda }_{1}+{\lambda }_{2}$.

6. Write a proof of Markov’s Inequality for a random variable taking positive integer values.
7. Modify the scripts to compute the proportion of sample means which deviate from $p$ by more than $𝜖$ for increasing values of $n$. Do the proportions decrease with increasing values of $n$? If so, at what rate do the proportions decrease?

__________________________________________________________________________ ### References

   Emmanuel Lesigne. Heads or Tails: An Introduction to Limit Theorems in Probability, volume 28 of Student Mathematical Library. American Mathematical Society, 2005.

   Sheldon Ross. A First Course in Probability. Macmillan, 1976.

   Sheldon M. Ross. Introduction to Probability Models. Elsevier, 8th edition, 2003.

__________________________________________________________________________ 1. Virtual Laboratories in Probability and Statistics.. Search the page for “Simulation Exercises and then run the Binomial Coin Experiment and the Matching Experiment.

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable eﬀort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reﬂects the thoughts, interests and opinions of its author. They do not explicitly represent oﬃcial positions or policies of my employer.

Information on this website is subject to change without notice.