link to University of Nebraska Lincoln
Department of Mathematics
203 Avery Hall
University of Nebraska Lincoln
Lincoln, NE 68588-0130
402-472-3731 (voice)
402-472-8466 (fax)
[an error occurred while processing this directive]


The Central Limit Theorem


QuestionofDay

Question of the Day

What is the most important probability distribution? Why do you choose that distribution as most important?


Key Concepts

Key Concepts

  1. The statement, meaning and proof of the Central Limit Theorem.


Vocabulary

Vocabulary

  1. The Central Limit Theorem : Suppose that for a sequence of independent, identically distributed random variables Xi, each Xi has finite variance s2 . Let
     V ~ -- V~ -- Zn = (Sn - nm)/(s n) = (1/s)(Sn/n - m) n
    and let Z be the “standard” normally distributed random variable with mean 0 and variance 1. Then Zn converges in distribution to Z, that is:
     integral a --1-- 2 lni-->mo o P[Zn < a] = V~ 2p-exp(- u /2) du - oo
    Roughly, a shifted and rescaled sample distribution is approximately standard normal.
  2. We expect the normal distribution to arise whenever the outcome of a situation, results from numerous small additive effects, with no single or small group of effects dominant.


Mathematical Ideas

Mathematical Ideas

The proofs in this section are drawn from Chapter 8, ”Limit Theorems”, A First Course in Probability, by Sheldon Ross, Macmillan, 1976. Further examples and considerations come from Heads or Tails: An Introduction to Limit Theorems in Probability, by Emmanuel Lesigne, American Mathematical Society, Chapter 7, pages 29-74; An Introduction to Probability Theory and Its Applications, Volume I, second edition, William Feller, J. Wiley and Sons, 1957, Chapter VII, and Dicing with Death: Chance, Health, and Risk by Stephen Senn, Cambridge University Press, Cambridge, 2003.

Convergence in Distribution

Lemma 1 Let X1,X2,... be a sequence of random variables having cumulative distribution functions FXn and moment generating functions fXn . Let X be a random variable having cumulative distribution function FX and moment generating function fX . If fX (t) --> fX(t) n , for all t, then F (t)-- > F (t) Xn X for all t at which F (t) X is continuous.

We say that the sequence Xi converges in distribution to X and we write

 dist Xi --> X.
Notice that P[a < Xi < b] = FXi(b) - FXi(a) --> F (b)- F(a) = P[a < X < b] , so convergence in distribution implies convergence of probabilities of events. Likewise, convergence of probabilities of events implies convergence in distribution.

This lemma is useful because it is fairly routine to determine the pointwise limit of a sequence of functions using ideas from calculus. It is usually much easier to check the pointwise convergence of the moment generating functions than it is to check the convergence in distribution of the corresponding sequence of random variables.

We won’t prove this lemma, since it would take us too far afield into the theory of moment generating functions and corresponding distribution theorems. However, the proof is a fairly routine application of ideas from the mathematical theory of real analysis.

Application: Weak Law of Large Numbers.

Here’s a simple representative example of using the convergence of the moment generating function to prove a useful result. We will prove a version of the Weak Law of Large numbers that does not require the variance of the sequence of independent, identically distributed random variables.

Theorem 2 (Weak Law of Large Numbers) Let X1,...,Xn be independent, identically distributed random variables each with mean m and such that E[ |X |] is finite. Let Sn = X1+...+Xn. Then Sn/n converges in probability to m . That is:

 lim P[| S /n - m |> e] = 0 n-->o o n

Proof: If we denote the moment generating function of X by f(t) , then the moment generating function of

 sum n Sn-= Xi- n i=1 n

is (f(t/n)n. The existence of the first moment assures us that f(t) is differentiable at 0 with a derivative equal to m . Therefore, by Taylor expansion with remainder

 ( ) f t- = 1 + m-t + r(t/n) n n

where r(t/n) is a remainder function such that

 lim r(t/n)-= 0. n-->o o (1/n)

Then we need to consider

 ( )n f t- = (1 + mt-+ r(t/n))n. n n

Taking the logarithm of 1 + mnt+ r(t/n))n and using L’Hospital’s Rule, we see that

f(t/n)n --> exp(mt).

But this last expression is the moment generating function of the (degenerate) point mass distribution concentrated at m . Hence,

 lim P[| Sn/n - m |> e] = 0 n-->o o

Q.E.D.

The Central Limit Theorem

Theorem 3 (Central Limit Theorem) Let X1,...Xn be independent, identically distributed random variables each with mean m and variance  2 s . Consider  sum n Sn = i=1 Xi Let

 V ~ -- V~ -- Zn = (Sn - nm)/(s n) = (1/s)(Sn/n - m) n
and let Z be the ”standard” normally distributed random variable with mean 0 and variance 1. Then Zn converges in distribution to Z, that is:
 integral a V~ --- 2 lni-->m oo P[Zn < a] = (1/ 2p) exp(- u /2) du - oo

Proof: Assume at first that m = 0 and s2 = 1 . Assume also that the moment generating function of the Xi, (which are identically distributed, so there is only one m.g.f) is fX (t) , exists and is everywhere finite. Then the m.g.f of Xi/ V~ -- n is

 V ~ -- V~ -- fX/ V~ n(t) = E[exp(tXi/ n)] = fX (t/ n).
Recall that the m.g.f of a sum of independent r.v.s is the product of the m.g.f.s. Thus the m.g.f of Sn/ V~ -- n is (note that here we used m = 0 and s2 = 1 )
 V~ -- fSn/ V~ n(t) = [fX(t/ n)]n
The Taylor series expansion of fX (t) is:
 ' '' 2 2 2 2 fX (t) = fX (0) + f X(0)t + (fX(0)/2)t + o(t ) = 1 + t /2 + o(t )
again since E[X] = f'(0) is assumed to be 0 and Var(X) = E[X2] - (E[X])2 = f''(0)- (f'(0))2 = f''(0) is assumed to be 1. is assumed to be 1. Thus,
 V~ -- 2 2 f(t/ n) = 1 + t /(2n) + o(t /n)
implying that
f V~ = [1 + t2/(2n) + o(t2/n)]n. Sn/ n
Now by some standard results from calculus,
[1 + t2/(2n) + o(t2/n)]n --> exp(t2/2)
as n --> oo . (If the reader needs convincing, it’s computationally easier to show that
 2 2 n 2 log((1 + t /(2n) + o(t /n)) ) --> t/2,
in order to account for the o(.) terms.)

To handle the general case, consider the standardized random variables (Xi - m)/s , each of which now has mean 0 and variance 1 and apply the result. Q.E.D.

The first version of the central limit theorem was proved by DeMoivre around 1733 for the special case when the Xi are binomial random variables with p = 1/2 = q. This proof was subsequently extended by Laplace to the case of arbitrary p/=q. Laplace also discovered the more general form of the Central Limit Theorem presented here. His proof however was not completely rigorous, and in fact, cannot be made completely rigorous. A truly rigorous proof of the Central Limit Theorem was first presented by the Russian mathematician Liapunov in 1901-1902. As a result, the Central Limit Theorem (or a slightly stronger version of the CLT) is occasionally referred to as Liapunov’s theorem. A theorem with weaker hypotheses but with equally strong conclusion is Lindeberg’s Theorem of 1922. It says that the sequence of random variables need not be identically distributed, but instead need only have zero means, and the individual variances are small compared to their sum.

Accuracy of the Approximation by the Central Limit Theorem

The statement of the Central Limit Theorem does not say how good the approximation is. In general the approximation given by the Central Limit Theorem applied to a sequence of Bernoulli random trials or equivalently to a binomial random variable is acceptable when np(1 - p) > 18. The normal approximation to a binomial deteriorates as the interval (a,b) over which the probability is computed moves away from the binomial’s mean value np.

The Berry-Esséen Theorem gives an explicit bound: For independent, identically distributed random variables Xi with m = E[Xi] = 0 , s2 = E[X2i ] , and r = E[|X3|] , then

|| V ~ -- integral a 1 2 || 33 r 1 ||P[Sn/(s n) < a] - V~ --eu /2 du ||< ----3 V~ --. - oo 2p 4 s n

Illustration 1

We expect the normal distribution to arise whenever the outcome of a situation, results from numerous small additive effects, with no single or small group of effects dominant. Here is an illustration of that principle.

This illustration is adapted from Dicing with Death: Chance, Health, and Risk by Stephen Senn, Cambridge University Press, Cambridge, 2003.

Consider the following data from an American study called the National Longitudinal Survey of Youth (NLSY). This study originally obtained a sample of over 12,000 respondents aged 14-21 years in 1979. By 1994, the respondents were aged 29-36 years and had 15,000 children among them. Of the respondents 2,444 had exactly two children. In these 2,444 families, the distribution of children was boy-boy: 582; girl-girl 530, boy-girl 666, and girl-boy 666. It appears that the distribution of girl-girl family sequences is low compared to the other combinations, our intuition tells us that all combinations are equally likely and should appear in roughly equal proportions. We will assess this intuition with the Central Limit Theorem.

Consider a sequence of 2,444 trials with each of the two-child families. Let Xi = 1 (success) if the two-child family is girl-girl, and Xi = 0 (failure) if the two-child family is otherwise. We are interested in the probability distribution of

 sum 2444 S2444 = Xi. i=1

In particular, we are interested in the probability P[S2444 < 530] , that is , what is the probability of seeing as few as 530 girl-girl families or even fewer in a sample of 2444 families? We can use the Central Limit Theorem to estimate this probability.

We are assuming the family “success” variables Xi are independent, and identically distributed, a reasonable but arguable assumption. Nevertheless, without this assumption, we cannot justify the use of the Central Limit Theorem, so we adopt the assumption. Then m = E[Xi] = (1/4) .1 + (3/4) .0 = 1/4 and Var[Xi] = (1/4)(3/4) = 3/16 so  V~ -- s = 3/4 Hence

 S - 2444 .(1/4) 530 - 2444 .(1/4) P[S2444 < 530] = P[ -24 V~ 44----- V~ --------< -- V~ ----- V~ ------] ( 3/4 . 2444) ( 3/4 . 2444) ~~ P[Z < - 3.7838] ~~ 0.0000772
Therefore, we are justified in thinking that under our assumptions, the proportion of girl-girl families is low. It is highly unlikely that under our assumptions such a proportion would have occurred. We then begin to suspect our assumptions, one of which was the implicit assumption that the appearance of girls was equally likely as boys, leading to equal proportions of the four types of families. In fact, there is ample evidence that the birth of boys is more likely than the birth of girls.

Illustration 2

We expect the normal distribution to arise whenever the outcome of a situation, results from numerous small additive effects, with no single or small group of effects dominant. Here is another illustration of that principle.

The following is adapted from An Introduction to Probability Theory and Its Applications, Volume I, second edition, William Feller, J. Wiley and Sons, 1957, Chapter VII.3(e), page 175.

The Central Limit Theorem can be used to assess risk. Two large banks compete for customers to take out loans. The banks have comparable offerings. Assume that each bank has a certain amount of funds available for loans to customers. Any customers seeking a loan beyond the available funds will cost the bank, either as a lost opportunity cost, or because the bank itself has to borrow to secure the funds to loan to the customer. If too few customers take out loans, then that also costs the bank, since now the bank has unused funds.

We create a simple mathematical model of this situation. We suppose that the loans are all of equal size and for definiteness each bank has funds available for a certain number (to be determined) of these loans. Then suppose n customers select a bank independently and at random. Let Xi = 1 if customer i select bank H with probability 1/2 and Xi = 0 if customers select bank T, also with probability 1/2. Then  sum n Sn = i=1 Xi is the number of loans from bank H to customers. Now there is some positive probability that more customers will turn up than can be accommodated. We can approximate this probability with the Central Limit Theorem:

 V~ -- V ~ - P[Sn > s] = P[(Sn - n/2)/((1/2) n) V~ >-(s - n/2)/((1/2) n)] ~~ P[Z > (s - n/2)/((1/2) n)] V ~ - = P[Z > (2s - n/ n)]
Now if n is large enough that this probability is less than (say) 0.01, then the number of loans will be sufficient in 99 of 100 cases. Looking up the value in a normal probability table,
2s V~ --n > 2.33 n

so if n = 1000, then s = 537 will suffice. If both banks assume the same risk of sellout at 0.01, then each will have 537 for a total of 1074 loans, of which 74 will be unused. In the same way, if the bank is willing to assume a risk of 0.20, i.e. having enough loans in 80 of 100 cases, then they would need funds for 514 loans, and if they want to have sufficient seats in 999 out of 1000 cases, they should have 549 loans available.

Now the possibilities for generalization and extension are apparent. A first generalization would be allow the loan amounts to be random with some distribution. Still we could apply the Central Limit Theorem to approximate the demand on available funds. Second, the cost of either unused funds or lost business could be multiplied by the chance of occurring. The total of the products would be an expected cost, which could then be minimized.


Problems to Solve

Problems to Work for Understanding

  1. A first simple assumption is that the daily change of a company’s stock on the stock market is a random variable with mean 0 and variance s2 . That is, if Sn represents the price of the stock on day n with S0 given, then
    Sn = Sn- 1 + Xn, n > 1
    where X1,X2,... are independent, identically distributed continuous random variables with mean 0 and variance  2 s . (Note that this is an additive assumption about the change in a stock price. In the binomial tree models, we assumed that a stock’s price changes by a multiplicative factor up or down. We will have more to say about these two distinct models later.) Suppose that a stock’s price today is 100. If s2 = 1 , what can you say about the probability that after 10 days, the stock’s price will be between 95 and 105 on the tenth day?
  2. Let X1,X2,...,X10 be independent Poisson random variables with mean 1. First use the Markov Inequality to get a bound on P[X + ...+ X > 15] 1 10 . Next use the Central Limit theorem to get a bound on P[X1 + ...+ X10 > 15] .
  3. Find the moment generating function fX (t) = E[exp(tX)] of the random variable X which takes values 1 with probability 1/2 and -1 with probability 1/2. Show directly (that is, without using Taylor polynomial approximations) that f (t/ V~ n)n --> exp(t2/2) X . (Hint: Use L’Hopital’s Theorem to evaluate the limit, after taking logarithms of both sides.)


Reading Suggestion

Reading Suggestion:

  1. Chapter 8, ”Limit Theorems”, A First Course in Probability, by Sheldon Ross, Macmillan, 1976.
  2. Dicing with Death: Chance, Health, and Risk by Stephen Senn, Cambridge University Press, Cambridge, 2003.
  3. Heads or Tails: An Introduction to Limit Theorems in Probability, by Emmanuel Lesigne, American Mathematical Society, 2005, Chapter 7, pages 29-74.
  4. An Introduction to Probability Theory and Its Applications, Volume I, second edition, William Feller, J. Wiley and Sons, 1957, Chapter VII.


Outside Readings and Links:

  1. Virtual Laboratories in Probability and Statistics. Search the page for Binomial approximation and then run the Binomial Timeline Experiment.


[an error occurred while processing this directive]

Steve Dunbar's Home Page, http://www.math.unl.edu/~sdunbar1
Email to Steve Dunbar, sdunbar1@unl.edu

Last modified: Tuesday, 09-Oct-2007 06:33:51 CDT