Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
Voice: 402-472-3731
Fax: 402-472-8466

Stochastic Processes and
Advanced Mathematical Finance


Laws of Large Numbers


Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See for details on supported browsers, accessibility, copy-and-paste, and other features.




Mathematically Mature: may contain mathematics beyond calculus with proofs.


Section Starter Question

Section Starter Question

Consider a fair (p = 12 = q) coin tossing game carried out for 1000 tosses. Explain in a sentence what the “law of averages” says about the outcomes of this game.


Key Concepts

Key Concepts

  1. The precise statement, meaning and proof of the Weak Law of Large Numbers.
  2. The precise statement and meaning of the Strong Law of Large Numbers.




  1. The Weak Law of Large Numbers is a precise mathematical statement of what is usually loosely referred to as the “law of averages”. Precisely, let X1,,Xn be independent, identically distributed random variables each with mean μ and variance σ2. Let Sn = X1 + + Xn and consider the sample mean or more loosely, the “average” Snn. Then the Weak Law of Large Numbers says that the sample mean Snn converges in probability to the population mean μ. That is:
    lim nn |Snn μ| > 𝜖 = 0.

    In words, the proportion of those samples whose sample mean differs significantly from the population mean diminishes to zero as the sample size increases.

  2. The Strong Law of Large Numbers says that Snn converges to μ with probability 1. That is:
    lim nSnn = μ = 1.

    In words, the Strong Law of Large Numbers “almost every” sample mean approaches the population mean as the sample size increases.


Mathematical Ideas

Mathematical Ideas

The Weak Law of Large Numbers

Lemma 1 (Markov’s Inequality). If X is a random variable that takes only nonnegative values, then for any a > 0:

X a 𝔼 Xa

Proof. Here is a proof for the case where X is a continuous random variable with probability density f:

𝔼 X =0xf(x)dx =0axf(x)dx +axf(x)dx axf(x)dx aaf(x)dx = aaf(x)dx = a X a.

(The proof for the case where X is a purely discrete random variable is similar with summations replacing integrals. The proof for the general case is exactly as given with dF(x) replacing f(x)dx and interpreting the integrals as Riemann-Stieltjes integrals.) □

Lemma 2 (Chebyshev’s Inequality). If X is a random variable with finite mean μ and variance σ2, then for any value k > 0:

|X μ| k σ2k2.

Proof. Since (X μ)2 is a nonnegative random variable, we can apply Markov’s inequality (with a = k2) to obtain

(X μ)2 k2 𝔼 (X μ)2 k2.

But since (X μ)2 k2 if and only if |X μ| k, the inequality above is equivalent to:

|X μ| k σ2k2

and the proof is complete. □

Theorem 3 (Weak Law of Large Numbers). Let X1,X2,X3,, be independent, identically distributed random variables each with mean μ and variance σ2. Let Sn = X1 + + Xn. Then Snn converges in probability to μ. That is:

lim nn |Snn μ| > 𝜖 = 0.

Proof. Since the mean of a sum of random variables is the sum of the means, and scalars factor out of expectations:

𝔼 Snn = (1n) i=1n𝔼 X i = (1n)(nμ) = μ.

Since the variance of a sum of independent random variables is the sum of the variances, and scalars factor out of variances as squares:

Var Snn = (1n2) i=1n Var X i = (1n2)(nσ2) = σ2n.

Fix a value 𝜖 > 0. Then using elementary definitions for probability measure and Chebyshev’s Inequality:

0 n |Snn μ| > 𝜖 n |Snn μ| 𝜖 σ2(n𝜖2).

Then by the squeeze theorem for limits

lim nn |Snn μ| > 𝜖 = 0.

Jacob Bernoulli originally proved the Weak Law of Large Numbers in 1713 for the special case when the Xi are binomial random variables. Bernoulli had to create an ingenious proof to establish the result, since Chebyshev’s inequality was not known at the time. The theorem then became known as Bernoulli’s Theorem. Simeon Poisson proved a generalization of Bernoulli’s binomial Weak Law and first called it the Law of Large Numbers. In 1929 the Russian mathematician Aleksandr Khinchin proved the general form of the Weak Law of Large Numbers presented here. Many other versions of the Weak Law are known, with hypotheses that do not require such stringent requirements as being identically distributed, and having finite variance.

The Strong Law of Large Numbers

Theorem 4 (Strong Law of Large Numbers). Let X1,X2,X3,, be independent, identically distributed random variables each with mean μ and variance 𝔼 Xj2 < . Let Sn = X1 + + Xn. Then Snn converges with probability 1 to μ,

lim nSn n = μ = 1.

The proof of this theorem is beautiful and deep, but would take us too far afield to prove it. The Russian mathematician Andrey Kolmogorov proved the Strong Law in the generality stated here, culminating a long series of investigations through the first half of the 20th century.

Discussion of the Weak and Strong Laws of Large Numbers

In probability theory a theorem that tells us how a sequence of probabilities converges is called a weak law. For coin tossing, the sequence of probabilities is the sequence of binomial probabilities associated with the first n tosses. The Weak Law of Large Numbers says that if we take n large enough, then the binomial probability of the mean over the first n tosses differing “much” from the theoretical mean should be small. This is what is usually popularly referred to as the law of averages. However, this is a limit statement and the Weak law of Large Numbers above does not indicate the rate of convergence, nor the dependence of the rate of convergence on the difference 𝜖. Note furthermore that the Weak Law of Large Numbers in no way justifies the false notion called the “Gambler’s Fallacy”, namely that a long string of successive Heads indicates a Tail “is due to occur soon”. The independence of the random variables completely eliminates that sort of prescience.

A strong law tells how the sequence of random variables as a sample path behaves in the limit. That is, among the infinitely many sequences (or paths) of coin tosses we select one “at random” and then evaluate the sequence of means along that path. The Strong Law of Large Numbers says that with probability 1 that sequence of means along that path will converge to the theoretical mean. The formulation of the notion of probability on an infinite (in fact an uncountably infinite) sample space requires mathematics beyond the scope of the course, partially accounting for the lack of a proof for the Strong Law here.

Note carefully the difference between the Weak Law of Large Numbers and the Strong Law. We do not simply move the limit inside the probability. These two results express different limits. The Weak Law is a statement that the group of finite-length experiments whose sample mean is close to the population mean approaches all of the possible experiments as the length increases. The Strong Law is an experiment-by-experiment statement, it says (almost every) sequence has a sample mean that approaches the population mean. This is reflected in the subtle difference in notation here. In the Weak Law the probabilities are written with a subscript: n indicating this is a binomial probability distribution with parameter n (and p). In the Strong Law, the probability is written without a subscript, indicating this is a probability measure on a sample space. Weak laws are usually much easier to prove than strong laws.


This section is adapted from Chapter 8, “Limit Theorems”, A First Course in Probability, by Sheldon Ross, Macmillan, 1976.

Algorithms, Scripts, Simulations

Algorithms, Scripts, Simulations


The experiment is flipping a coin n times, and repeat the experiment k times. Then compute the proportion for which the sample mean deviates from p by more than 𝜖.





R script for the Law of Large Numbers.

1p <- 0.5 
2n <- 10000 
3k <- 1000 
4coinFlips <- array( 0+(runif(n*k) <= p), dim=c(n,k)) 
5# 0+ coerces Boolean to numeric 
6headsTotal <- colSums(coinFlips) 
7# 0..n binomial rv sample, size k 
9epsilon <- 0.01 
10mu <- p 
11prob <- sum( 0+(abs( headsTotal/n - mu ) > epsilon) )/k 
12cat(sprintf("Empirical probability: %f \n", prob ))

Octave script for the Law of Large Numbers.

1p = 0.5; 
2n = 10000; 
3k = 1000; 
5coinFlips = rand(n,k) <= p; 
6headsTotal = sum(coinFlips); 
7# 0..n binomial rv sample, size k 
9epsilon = 0.01 
10mu = p; 
11prob = sum( abs( headsTotal/n - mu ) > epsilon)/k; 
12disp("Empirical probability:"), disp( prob )

Perl PDL script for the Law of Large Numbers.

1use PDL::NiceSlice; 
3$p = 0.5; 
4$n = 10000; 
5$k = 1000; 
7$coinFlips = random( $k, $n ) <= $p; 
9#note order of dims!! 
10$headsTotal = $coinFlips->transpose->sumover; 
12# 0..n binomial r.v. sample, size k 
13# note transpose, PDL likes x (row) direction for 
14# implicitly threaded operations 
16$epsilon = 0.01; 
17$mu      = $p; 
19$prob = ( ( abs( ( $headsTotal / $n ) - $mu ) > $epsilon )->sumover ) / $k; 
21print "Empirical probability: ", $prob, "\n";

Scientific Python script for the Law of Large Numbers.

1import scipy 
3p = 0.5 
4n = 10000 
5k = 1000 
7coinFlips = scipy.random.random((n,k))<= p 
8# Note Booleans True for Heads and False for Tails 
9headsTotal = scipy.sum(coinFlips, axis = 0) 
10# 0..n binomial r.v. sample, size k 
11# Note how Booleans act as 0 (False) and 1 (True) 
13epsilon = 0.01 
14mu = p 
16prob = (scipy.sum( abs( headsTotal.astype(float)/n - mu)  >  epsilon)).astype(float)/k 
17# Note the casting of integer types to float to get floats 
19print "Empirical probability: ", prob


Problems to Work

Problems to Work for Understanding

  1. Suppose X is a continuous random variable with mean and variance both equal to 20. What can be said about 0 X 40?
  2. Suppose X is an exponentially distributed random variable with mean 𝔼 X = 1. For x = 0.5, 1, and 2, compare X x with the Markov Inequality bound.
  3. Suppose X is a Bernoulli random variable with X = 1 = p and X = 0 = 1 p = q. Compare X 1 with the Markov Inequality bound.
  4. Make a sequence of 100 coin tosses and keep a record as in Experiment.. How “typical” was your coin flip sequence? All 2100 coin flip sequences are equally likely of course, so yours is neither more nor less typical than any other in that way. However, some sets or events of coin flip sequences are more or less “typical” when measured by the probability of a corresponding event. What is your value of S100, and the number of heads and the number of tails in your record? Using your value of S100 let 𝜖 = |S100| and use Chebyshev’s Inequality as in the proof of the Weak Law of Large Numbers to provide an upper bound on the probability that for all possible records |S100| > 𝜖.
  5. Let X1,X2,,X10 be independent Poisson random variables with mean 1. First use the Markov Inequality to get a bound on
    X1 + + X10 > 15

    . Next find the exact probability that X1 + + X10 > 15 using that the fact that the sum of independent Poisson random variables with parameters λ1, λ2 is again Poisson with parameter λ1 + λ2.

  6. Write a proof of Markov’s Inequality for a random variable taking positive integer values.
  7. Modify the scripts to compute the proportion of sample means which deviate from p by more than 𝜖 for increasing values of n. Do the proportions decrease with increasing values of n? If so, at what rate do the proportions decrease?



Reading Suggestion:


[1]   Emmanuel Lesigne. Heads or Tails: An Introduction to Limit Theorems in Probability, volume 28 of Student Mathematical Library. American Mathematical Society, 2005.

[2]   Sheldon Ross. A First Course in Probability. Macmillan, 1976.

[3]   Sheldon M. Ross. Introduction to Probability Models. Elsevier, 8th edition, 2003.



Outside Readings and Links:

  1. Virtual Laboratories in Probability and Statistics.. Search the page for “Simulation Exercises and then run the Binomial Coin Experiment and the Matching Experiment.


I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar’s Home Page,

Email to Steve Dunbar, sdunbar1 at unl dot edu

Last modified: Processed from LATEX source on July 21, 2016