link to University of Nebraska Lincoln
Department of Mathematics
203 Avery Hall
University of Nebraska Lincoln
Lincoln, NE 68588-0323
402-472-3731 (voice)
402-472-8466 (fax)

Math 489/Math 889
Stochastic Processes and
Advanced Mathematical Finance
Dunbar, Fall 2007


Conditional Probability and Conditional Distributions


Key Concepts

Key Concepts

  1. The definition of conditional probability for events:
     Pr[E-- /~\ -F] Pr[E |F] = Pr[F ] .
  2. Probability by conditioning is
    Pr[E /~\ F ] = Pr[E|F ] .Pr[F ].
  3. The “Law of Total Probability”: Let F1,F2,...,Fn be a set of mutually exclusive events which partition the sample space. Then
     sum n Pr[E] = Pr[E |Fi] .Pr[Fi]. i=1
  4. The conditional probability mass function of X given Y = y is
    pX |Y (x|y) = Pr[X = x |Y = y] = Pr[{X--=--x}/ ~\ -{Y-=-y}] Pr[{Y = y}] p(x, y) = ------- pY(y)


Vocabulary

Vocabulary

  1. Conditional probability for events is the probability that event E will happen, given that event F has already occurred.
  2. Law of Total Probability involves applying the probability by conditioning and adding all the probabilities.
  3. The Ballot Problem In an election, candidate A receives n votes and candidate B receives m votes, where m > n, so A is the winner. Assuming that all orderings of vote countings are equally likely, the probability that A is always ahead in the count of votes is (n - m)/(n + m)
  4. The conditional probability mass function of X given Y = y is pp(Yx,(yy)). where pY (y) is the marginal probability mass function of Y .


Mathematical Ideas

Mathematical Ideas

This section is adapted from: S. K. Ross, Introduction to Probability Models, Third Edition, Academic Press, 1985, Chapter 3, pages 83-103, Probability by L. Breiman, Addison-Wesley, 1968.

Conditional Probability

Rating: PG

Recall the definition of conditional probability for events: What is the probability that event E will happen, given that event F has already occurred? For finite sample spaces and discrete probability, then our new, restricted sample space is F. The probability of E is proportional to that part lying in F

Pr[E |F] = Pr[E-- /~\ -F]. Pr[F ]

Taking conditional probabilities of various events with respect to a given event F amounts to choosing F as a new sample space; and we have to multiply all probabilities by 1/ Pr[F] in order to reduce the total probability to 1. This shows that all general theorems on probabilities are valid also for conditional probabilities. For example:

Pr[A U B |F] = Pr[A |F ] + Pr[B |F ]- Pr[A /~\ B |F ]

A frequently useful alternative formulation of conditional probability is sometimes called probability by conditioning (more on this later):

Pr[E /~\ F ] = Pr[E|F ] .Pr[F ].

To extend and generalize to three events:

Pr[E /~\ F /~\ G] = Pr[E |F /~\ G] .Pr[F |G] .Pr[G]

Further generalization to more events is obvious.

Another very useful generalization is sometimes called the “Law of Total Probability”: Let F1,F2,...,Fn be a set of mutually exclusive events which partition the sample space, that is, specifically Fi  /~\ Fj = Ø and F1  U ... U Fn = _O_. Then necessarily

E = (E /~\ F1) U (E /~\ F2) U ...U (E /~\ Fn)

and applying the probability by conditioning and adding all the probabilities:

 sum n Pr[E] = Pr[E |Fi] .Pr[Fi] i=1

This law of total probability is very useful because the conditional probability of each sub-event is sometimes much easier than a direct calculation of Pr[E].

Example 1 The following example is a classic in elementary probability theory called the Ballot Problem. In an election, candidate A receives n votes and candidate B receives m votes, where m > n, so A is the winner. Assuming that all orderings of vote countings are equally likely, the probability that A is always ahead in the count of votes is (n - m)/(n + m)

It is advisable always to check boundary or extreme cases to determine the reasonableness of a formula such as this. For example if m = 0, then the probability that A is always ahead in the vote count is 1 from the formula. On the other hand, if A wins b only one vote, so n-1 = m, then the probability that A is always ahead is 1/(n + m) = 1/(2n - 1), which is fairly small as we expect. (We will use this case in the next example.)

We can use the Law of Total Probability to compute the probabilities.

Proof: Let Pn,m denote the desired probability. By conditioning on which candidate receives the last vote counted we have

 --n---- Pn,m = Pr[A always ahead |A receives last vote]n + m m + Pr[A always ahead |B receives last vote]------. n + m
(It takes a moment’s thought to convince oneself that the probability that A receives the last vote is n/(n + m). Just realize that there are n . (n + m - 1)! orderings where A gets the last vote, out of all (n + m)! orderings. Likewise when B receives the last vote.)

Now given that A received the last vote, one can see that the probability that A is always ahead is the same as if A had received n - 1 votes, B received m votes, and A was always ahead, namely Pn-1,m. Likewise the probability that A was always ahead, while B received the last vote is just Pn,m-1. Hence the probability by conditioning is

 --n---- ---m--- Pn,m = Pn- 1,m n + m + Pn,m-1n + m .

Note that we have turned the probability problem into solving a difference equation in two variables. We can then verify directly that Pn,m = (n - m)/(n + m) is a solution. (If that is an unsatisfying approach for you, then you can solve the difference equation either by induction starting from the trivial case P1,0 = 1, or by appealing to a direct difference equation solution method.) #

Example 2 Consider a coin-flipping game, where the probability of a head is p and the probability of a tail is 1-p. What is the probability that for the first time on flip 2n after beginning, the total number of heads is the same as the total number of tails? Since the number of heads is the same as the number of tails, the total number of coin-flips is double the number of heads, hence an even number. Note that we do ask that heads be ahead of tails until flip 2n or that tails be ahead of heads, only that they first equalize at flip 2n

Equivalently, consider a gambler starting from an initial stake X0. The gambler wins a dollar when the coin flip is heads with probability p. The gambler loses a dollar when the coin flip is tails with probability 1 - p. Let the outcome of the ith flip be

 { X = 1 with probability p - 1 with probability 1- p

Then the gambler’s fortune at stage n is X0+ sum i-1nXi. We are asking what is the probability that the gambler’s fortune is again X0 for the first time at step 2n? We do not care if the gambler has been ahead or behind, only that he gambler’s fortune again is his starting value at flip 2n.

We compute this by conditioning on the total number of heads in the first 2n flips.

 2n ( ) sum 2n m 2n-m Pr[first time equal is 2n] = Pr[first time equal is 2n|m heads in 2n]. m p (1- p) m=0

Now clearly all but one of these conditional probabilities are 0. So this reduces nicely to

 ( ) 2n n n Pr[first time equal is 2n] = Pr[first time equal is 2n| n heads in 2n]. n p (1- p) .

Now given a total of n heads and n tails in 2n flips, all possible orderings of the heads and tails are equivalent. Thus the single conditional probability we are considering above is the same as ballot-counting in which each candidate receives n votes, but one of the candidates is always ahead until the last vote which ties them. But condition on whomever receives the last vote,  

Pr[first time H and T equal is 2n|n heads in 2n] 1- = Pr[H has n, T has n - 1, H always ahead |T gets last] .2 + 1 Pr[H has n - 1, T has n, T always ahead | H gets last] .-- 2
We see that each of the conditional probabilities is just the probability in the ballot problem when m = n - 1. Hence
 ( ) ( ) 2n n n 2nn pn(1 - p)n Pr[first time equal is 2n] = Pn,n-1 p (1- p) = -------------- n 2n - 1

Conditional Densities and Distributions

Rating: PG-13

The definition of conditional probability immediately motivates the following definition when X and Y are discrete random variables:

Definition 1 The conditional probability mass function of X given Y = y is

pX|Y(x|y) = Pr[X = x|Y = y] Pr[{X--=-x}/ ~\ -{Y-=-y}]- = Pr[{Y = y}] = p(x,y)- pY (y)
where (x,y) is the joint probability distribution of X and Y and pY (y) is the marginal probability mass function of Y . The conditional cumulative distribution function of X given Y = y is defined as
 sum FX|Y(x|y) = Pr[X < x|Y = y] = pX |Y (a| y) a<x

Note that there is no particular difficulty in applying this definition even if the discrete probability is a for a random variable assuming infinitely many values (for instance, a Poisson random variable.) However, we take as a convention that conditional probabilities are undefined if the denominator event F, or for discrete distributions, Y = y has zero probability.

If the random variables X and Y are continuous, we could still appeal to the quotient fX,Y (x,y)/fY (y) as the definition of FX|Y (x|y) and argue by analogy. However, this is not a legitimate mathematical approach. Nevertheless, we can justify this conclusion by taking a limit of cumulative distributions

Pr[X < x |Y = y] = lih-->m0 Pr[X < x |y < Y < y + h] integral y+h integral x -y----- oo -fX,Y(t,u)-dt du = lih-->m0 integral y+h y fY(u) du d- integral y+h integral x f (t,u) dt du = lim dh-y--- integral -- oo -X,Y---------- h-->0 ddh yy+hfY (u) du integral x -- oo -fX,Y(t,y-+-h)-dt = lih-->m0 fY(y + h) integral x = -- oo -limh--->0-fX,Y(t,y-+-h)-dt limh -->0 fY (y + h) integral x = -- oo -fX,Y-(t,y)-dt fY (y) integral x f (t,y) = --X,Y------dt - oo fY (y)
Note that the second equality is a use of l’Hospital’s rule, and the third inequality is from the Fundamental Theorem of Calculus. Finally, the interchange of limit and integral is assumed and the continuity of the joint and marginal pdf’s is assumed. Naturally, we have to assume that fY (x)/=0. Under all these hypotheses, fX,Y (x,y)/fY (x) behaves as a conditional density function, justifying our extension of the definition. However, the number and variety of hypotheses is unsatisfying.

A more general way of defining conditional probabilities

Rating: R

Looking more closely at

Pr[X < x|Y = y] = lim Pr[X < x|y < Y < y + h] h-->0 integral integral y+h x fX,Y (t,u) du dt = lim -y----- integral oo y+h-------------- h-->0 y fY(t) dt
it appears that we are attempting to define something that looks very much like a derivative. In fact, if we take
Q(B) = Pr[A /~\ [Y (- B]] P (B) = Pr[Y (- B]
Then 0 < Q(B) < P(B) so Q is an absolutely continuous (probability) measure with respect to (probability) measure P. By the Radon-Nikodym theorem, we can define the derivative of Q with respect to P. This Radon-Nikodym derivative then becomes the conditional probability (measure). That is, the conditional probability of A given Y , Pr[A|Y ] is defined as a random variable on the sample space _O_ satisfying
 integral Pr[A /~\ [Y (- B]] = Pr[A |Y ]dPr x (- B

Thus, a measurable-theoretic analogue of the law of total probability becomes the definition of conditional probability. A complete discussion of this topic requires a complete understanding of measure theory and is beyond the scope of this course.


Problems to Work for Understanding

  1. Customers arrive at a bank at a Poisson rate c. Suppose two customers arrived in the first hour. What is the probability that
    1. both arrived in the first 20 minutes?
    2. at least one arrived in the first 20 minutes?

    Solution

  2. Three dice are rolled. If no two show the same face, what is the probability that one is an “ace” (one spot showing)?

    Solution

  3. Given that a throw with ten dice produced at least one ace, what is the probability of two or more aces?

    Solution

  4. In a bolt factory machines A, B, and C manufacture respectively 25, 35, and 40 percent of the total. Of their output 5, 4, and 2 percent are defective bolts. A bolt is drawn at random from the produce, and is found defective. What are the probabilities that it was manufactured by machines A, B, and C respectively?

    Solution


Reading Suggestion:

  1. S. K. Ross, Introduction to Probability Models, Third Edition, Academic Press, 1985.
  2. Probability by L. Breiman, Addison-Wesley, 1968, Section 4.1, pages 67-68.
  3. An Introduction to Probability Theory and its Applications Volume I, Second Edition by William Feller, J. Wiley and Sons, New York, 1957, Chapter V, Section 1, pages 104-106.


Outside Readings and Links:

  1.  Shodor Education Foundation, Inc. This applet allows the user to experiment with conditional probability.Submitted by Ravi Alluri, January 28, 2004.
  2. University of Rome ”La Sapienza” , Physics Department, Giulio D’Agostini. It gives a interpretation of conditional probability. Submitted by Ravi Alluri, January 28, 2004.
  3. University of Illinois, Urbaba-Champaign” , Department of Electrical and Computer Engineering It gives Extensive literature on Conditional Probability. Submitted by Ravi Alluri, January 28, 2004.
  4. R. Webster West, Department of Statistics, University of South Carolina Conditional Probability Applet - a applet shows the Venn Diagram and corresponding probabilities. Submitted by Haigang Zhou, January 27, 2004.
  5. Shangrong Deng, Southern Polytechnic State University An applet demonstrate the conditional probability and multiplication rule. Submitted by Haigang Zhou, January 27, 2004.
  6. The Department of Mathematical Sciences at the University of Alabama in Huntsville an applet with graph demonstrate the ballot experiment. Submitted by Haigang Zhou, February 1, 2004.


I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don't guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar's Home Page, http://www.math.unl.edu/~sdunbar1
Email to Steve Dunbar, sdunbar1@unl.edu

Last modified: [an error occurred while processing this directive]