Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
Probability Theory and Stochastic Processes
Steven R. Dunbar
The Hitting Time Theorem
Mathematicians Only: prolonged scenes of intense rigor.
Suppose a simple random walk of steps with has steps and steps . Explicitly enumerate all the possible walks. In how many ways does the walk stay less than throughout the entire walk until step ?
then we say the distribution is invariant under rotations.
The Ballot Theorem can be interpreted as the probability that, during the vote counting in an election between two candidates, the winning candidate is always ahead of the other. Suppose that in an election with votes, candidate A receives votes and candidate B receives votes, where for some positive integer . Assume that all ballot permutations are equally likely during the counting. By dividing by the total number of ballot permutations in the counting of the specialized ballot theorem, the probability is , see The Ballot Theorem and the Reﬂection Principle.
The problem can be restated as the probability that an -step random walk taking independent and identically distributed steps stays positive after time , given that it ends at height , is . If each step has probability , this probability can be derived easily using the reﬂection principle, an argument appearing in many textbooks. In fact, it is possible to prove the same result under weaker hypotheses, [?], [?].
The Hitting Time Theorem says the conditional probability that an -step random walk starting at , given that it ends at , ﬁrst hits at time is .
Remark. Note that:
In 1949 Otter gave the ﬁrst proof of the Hitting Time Theorem for . Kemperman in 1961 and Dwass in 1969 gave proofs for using generating functions and the Lagrange inversion formula. For simple Bernoulli random walks making steps of size , a proof using the Reﬂection Principle is in [?]. See [?] for references and discussion.
Fix . Let be a sequence of random variables taking values in . Note that this is more general than, but includes, the standard random walk where the random variables take values in . Deﬁne the walk as the usual piecewise linear function deﬁned by the values at the integers . Note that because the only positive value assumed by the steps is , for the random walk to gain a height , it has to pass through all intermediate integer values.
Proof. The proof proceeds by induction on . When , both sides are when and , and are equal to when . This is the base case of the induction.
For the induction step, take and consider the case . In this case, the left side requires for all , yet so the event is empty and the probability is . The right side is also equal to so the case is satisﬁed. Thus, we may assume that .
So take and consider the case . Condition on the ﬁrst step to obtain
For a given and , let . Because the steps are independent, identically distributed random variables, by the random walk Markov property, with and conditionally on the random walk has the same distribution as the random walk path .
where in the last equality we have used the induction hypothesis which is allowed since and with so . Because the steps are independent, identically distributed random variables , (see Figure 1) substitute into the summation to obtain
Temporarily ignore the in the denominator and consider the summation
By the properties of conditional expectation
By the assumption of independent, identically distributed steps, the conditional expectation is independent of so that
since when . Recalling the temporarily ignored factor of we arrive at
This completes the proof by induction. □
Example. See Figure 2 for an example.
The following example is adapted from [?] and [?]. The original problem is:
The Kentucky Derby is on Saturday, and a ﬁeld of horses is slated to run “the fastest two minutes in sports” in pursuit of the right to be draped with a blanket of roses. But let’s consider, instead, the Lucky Derby, where things are a little more bizarre:
The bugle sounds, and horses make their way to the starting gate for the ﬁrst annual Lucky Derby. These horses, all trained at the mysterious Riddler Stables, are special. Each second, every Riddler-trained horse takes one step. Each step is exactly one meter long. But what these horses exhibit in precision, they lack in sense of direction. Most of the time, their steps are forward (toward the ﬁnish line) but the rest of the time they are backward (away from the ﬁnish line). As an avid fan of the Lucky Derby, you’ve done exhaustive research on these competitors. You know that Horse One goes forward percent of the time, Horse Two percent of the time, Horse Three percent, and so on, up to the favorite ﬁlly, Horse Twenty, who steps forward percent of the time. The horses’ steps are taken independently of one another, and the ﬁnish line is meters from the starting gate.
Handicap this race and place your bets! In other words, what are the odds (a percentage is ﬁne) that each horse wins?
It is easy to create a simulation for this scenario. A sample script is in the scripts section below. The simulation of races suggests that the probability of horse winning is about and the probability is about .
Consider a particular horse. For each step, suppose it moves forward with probability (in this example ) or backward with probability . Recall that reaching an even-numbered position (such as in the problem statement) can only occur in an even number of steps, say . Then the horse must take forward steps and backward steps. The horse starts at position and we would like to know the probability that it will reach position (where speciﬁcally in the example with ) in exactly steps. For convenience, adopt the following notation:
This probability mass distribution does not seem to have a common name, but it is still possible to compute the statistics of this distribution.
Proof. Starting with
change the variables to rewrite the summation as
This series resembling a binomial series expansion suggests that the series might have a closed form. In fact, using Maple to ﬁnd a closed form solution, obtain
In turn the closed form for the power series suggests that it might be possible to express the moment-generating function in closed form. Again using Maple to ﬁnd a closed form expression
Then, once more using Maple, it is each to check that
The calculation of the variance is left as an exercise. □
For the horse race example, the mean number and standard deviation of steps required for each horse to reach position are in Table 1.
The probability distributions seem to be approximately normal in shape, see Figure 3. This is reasonable since the hitting time to level is the sum of random one-level hitting times, ﬁrst the hitting time from the initial point to ﬁrst reach , then the independent hitting time to ﬁrst reach from , and so on. The one-level hitting times are independent since the individual steps of the random walk are independent. Thus, as the sum of many independent random variables, the distribution of the hitting time to is approximately normal.
Suppose each horse ﬁnishes in steps. Computing the probability that speciﬁc horse wins the race amounts to ﬁnding the probability that for all . Since the probability distributions are approximately normal, approximate this probability with:
The probability calculation uses the usual notation: is the standard normal probability density (pdf), is the cumulative distribution (cdf) of the standard normal distribution, and and are the means and standard deviations computed in Table 1.
Analytic integration of the probability is impractical, and even the numerical integration of
is challenging. Taking as an example, the product term
is approximately for . Since each factor is less than and for , the product decreases to approximately for . The other factor outside the interval and has a maximum value of at . Therefore, is approximately outside the interval and reaches a maximum approximately . Computing the product of so many small magnitude factors could lead to numerical underﬂow, so instead the exponential of the sum and diﬀerence of the respective logarithms is computed.
The low variation with an aspect ratio of makes even adaptive integration challenging. The default integration routine of gives a nonsensical answer of . With the additional R library pracma of advanced numerical analysis routines, the quad function using adaptive Simpson quadrature yields and the quadl function using adaptive Lobatto quadrature yields the very similar . Numerical integration with Scientiﬁc Python using either general purpose integration or Romberg integration gives similar results.
Therefore we estimate horse has about a chance of winning. Repeating the analysis with the necessary changes, horse has about a chance of winning. One of the other horses winning has about an chance.
Fix . Let be a sequence of random variables taking values in . Deﬁne the walk as the usual piecewise linear function deﬁned by the values at the integers . Note also that because the only positive value assumed by the steps is , for the random walk to gain a height , it has to pass through all intermediate integer values. Given with , deﬁne the rotation of by as the walk corresponding to the rotated sequence . Another way to represent the rotation of the sequence is
Note that and the entire walk rotated by is the original walk, . We say that peaks at if for all . See Table 3 for several examples.
Proof. Set . If for some and such that , then , so can only peak at if for some satisfying . That is, can only peak at if there are no indices prior to with values greater than . Since the values of has to pass through all intermediate integer values, must be . The property of passing through all intermediate values implies that . Therefore, for , , and . It follows that peaks at if and only if . □
Example. This example is a speciﬁc illustration of the proof of the lemma with and the random walk given in Table 3.
For the example, . For the example for and such that . Now consider . Therefore, cannot peak at . Table 2 also illustrates that cannot peak at .
Now consider satisfying . Then . Note that peaks at .
The deﬁnition of is . For the example, . The next point in the lemma is that for , . For this example, Table 3 gives all values for comparison, recalling that .
Finally, the last point in the lemma is that for , . For this example, Table 3 gives all values for comparison, recalling that .
Deﬁnition. Call two walks equivalent if one is a rotation of the other. This deﬁnes an equivalence relation on the set of -step walks.
Proof. Consider a single equivalence class, and let be the sequence of increments of one member of the class. Every walk in the class ends at the same height , and the class contains either walks, or walks for some divisor of if happens to be periodic. In either case, the lemma implies that if and the joint probability distribution of is invariant under rotations, then
Choose one sequence of increments from each equivalence class and sum over them to obtain the result. □
Remark. The condition that the joint probability distribution of is invariant under rotations here is weaker than the traditional requirement that be independent and identically distributed.
Remark. This proof is another version of the Cycle Lemma from the proof of the Ballot Theorem.
Deﬁnition. A random vector is exchangeable if the joint probability distribution of is invariant for any permutation of .
Remark. The proof is not given here, see [?]. The idea of the proof is that the joint probability distribution of conditioned on is still exchangeable. This implies the conditional expectation is independent of . Then the conditional expectation step still holds, allowing the completion of the induction argument.
The history and background is adapted from [?]. The proof for i. i. d. random variables is adapted from [?]. The proof under rotation invariance is adapted from the article by Kager, [?]. The comments about exchangeability are adapted from [?].
Set the number of upsteps and the number of downsteps. Then the total number of steps is the length . There are then places to have the downsteps. In a sequence of nested loops systematically place the downsteps for B in an matrix of ones. Then cumulatively sum this matrix down the columns to get all possible walks and save the matrix for later plotting or analysis. Note that this algorithm does not scale well because of the inherent growth of the binomial coeﬃcients.
Set all parameters for the simulations, using a suﬃcient length of the race for most of the fastest horses to complete the race. Simulate all races in a vectorized manner, using the usual technique of creating uniformly distributed random variates, comparing row-wise to the Bernoulli probability, and cumulatively summing. Then using vectorized match functions, extract the index of the winning horse (breaking ties with minimum index), and then counting the number of times each horse wins with the histogram counter.
Using a normal approximation to the ﬁrst hitting time distribution for a Bernoulli random walk, construct the probability density for the th walk winning by multiplying the density and the survival functions, assuming independence. Then using diﬀerent numerical quadrature routines, compute the approximate total probability of the th walker winning.
Octave script for plotting walks.
GnuPlot script for uniform partitions.
R script for simulating the horse race example..
R script for winning probability for the horse race example..
Scientiﬁc Python script for winning probability for the horse race example..
The calculation uses the usual notation: is the standard normal probability density (pdf), is the cumulative distribution (cdf) of the standard normal distribution, and and are the means and standard deviations.
Provide a detailed argument that when , both sides are when and , and are equal to when . This is the base case of the induction.
I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable eﬀort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.
I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reﬂects the thoughts, interests and opinions of its author. They do not explicitly represent oﬃcial positions or policies of my employer.
Information on this website is subject to change without notice.
Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1
Email to Steve Dunbar, sdunbar1 at unl dot edu
Last modiﬁed: Processed from LATEX source on August 14, 2017