Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
Probability Theory and Stochastic Processes
Steven R. Dunbar
Mathematicians Only: prolonged scenes of intense rigor.
From the proof of the Weak Law of Large Numbers, we know
or more roughly, the deviation of the sample mean of Bernoulli trials from is . Is it possible to do provide a more reﬁned estimate of the probability of such a deviation? Why do you think so? What would be the possible statement of a better result?
Large Deviations Theory is about the remote tail of a probability distribution. Roughly speaking, large deviation theory estimates the exponential decay of the probability of extreme events, as the number of observations grows arbitrarily large. Large deviations theory was perhaps discovered by Swedish mathematician Harald Cramér who applied it to model the insurance business. More general results were later obtained by H. Chernoﬀ, among other people. An incomplete list of mathematicians who have made important advances would include S. S. R. Varadhan, D. Ruelle and O. E. Lanford.
Given parameter value , deﬁne for as a function of
See Figure 1.
Note that is the entropy of the Bernoulli distribution with probability and complementary probability . Therefore is the entropy of the distribution with probability .
The ﬁrst two lemmas are Taylor expansions of and in to give a sense of the asymptotics of these functions.
See also Figure 1.
See also Figure 2 which displays as a function of for and .
Proof. Fix and then
Now apply Markov’s Inequality to this positive random variable:
Now to complete the estimation, we only need to ﬁnd
For notation, let for parameters and . Then and , so . Furthermore, . Thus the supremum is attained at some strictly positive value. The derivative is zero only at
Therefore, has a maximum value of since by plugging in the critical point, we obtain:□
Deﬁne for .
Proof. Interchange the designation of success and failure, so now “success” occurs with probability and “failure” occurs with probability . Consider the probability that the proportion of “successes” in this complementary process, exceeds the mean by :
where . The inequality is then
See Figure 3. Note that this estimate is correct but empty for and a neighborhood of .
We will frequently use the following lemma to estimate the value of a binomial probability for large values of and proportionally large values of .
Proof. We know that
Use Stirling’s approximation in the form . Find the asymptotics of the binomial probability by replacing each factorial with the Stirling approximation. That is
The estimate given by the Large Deviations result is as good as it can be in the following sense:
First, the Large Deviations Estimate can be written as
Let be the least integer greater than or equal to . Then
Therefore, as . Also .
We will show that
and we will have established the theorem.
We already know from the binomial probability
Now goes to as . Note that the probability on the left side goes to 0 as by the Weak Law of Large of Numbers. By the Zero Asymptotic Limits Lemma in Asymptotic Limits., the right side must also go to 0 as . Since the two sequences are asymptotic, by the Logarithm Lemma in Asymptotic Limits., their logarithms must be asymptotic.
Since and , then
Now recall that is bounded by 1, so
Using the same basic calculations, we can show
Say that a sequence if there are constants and such that .
Remark. Note the alternative deﬁnition of the entropy-style function including the base- logarithm.
Remark. By symmetry, , so that this provides an inequality similar to the Large Deviations result above.
The ﬁrst inequality is very crude, essentially
The proof principle here is that the terms are relatively dominated by the greatest term .
Again the proof principle is that the probability is essentially dominated by .
Remark. Let be an arbitrary random variable. Then for every
where is the moment-generating function of . The inequality follows from Markov’s Inequality. Likewise,
Remark. If moreover, with independent,
where . Similarly for ,
by Jensen’s Inequality.
since . Then since .
The last inequality follows from some analysis.
Remark. The Chernoﬀ bounds are not asymptotic. In particular, they also hold when and are functions of .
This section is adapted from: Heads or Tails, by Emmanuel Lesigne, Student Mathematical Library Volume 28, American Mathematical Society, Providence, 2005, Chapter 6, .
The alternative theorems and proofs are adapted from class notes by Xavier Perez.
Some history and deﬁnitions are also adapted from the Wikipedia article on Large Deviations Theory.
I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable eﬀort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.
I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reﬂects the thoughts, interests and opinions of its author. They do not explicitly represent oﬃcial positions or policies of my employer.
Information on this website is subject to change without notice.
Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1
Email to Steve Dunbar, sdunbar1 at unl dot edu
Last modiﬁed: Processed from LATEX source on November 9, 2018