Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
Voice: 402-472-3731
Fax: 402-472-8466

Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar


An Analytic Model for Coin Tossing


Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See for details on supported browsers, accessibility, copy-and-paste, and other features.




Mathematicians Only: prolonged scenes of intense rigor.


Section Starter Question

Section Starter Question

What are the axioms for a probability space? What are the assumptions about coin-flips or successive Bernoulli trials that make the process a probability space?


Key Concepts

Key Concepts

  1. The sequence of products cos(x2), cos(x2) cos(x4), cos(x2) cos(x4) cos(x8),

    converges uniformly on bounded intervals to

    k=1 cos x 2k = sin x x

  2. An unusual formula involving π, due to Vieta, is
    2 π = 2 2 2 + 2 2 2 + 2 + 2 2

  3. Given t with 0 t 1 with unique binary expansion,
    t = 𝜖1(t) 2 + 𝜖2(t) 22 + 𝜖3(t) 23 +

    the Rademacher functions are rk(t) = 1 2𝜖k(t).

  4. For the Rademacher functions
    01 k=1 exp ixrk(t) 2k dt = k=101 exp ixrk(t) 2k dt

  5. The correspondence of H with +1, T with 1, the kth toss with rk(t), an event with the set of t’s in (0, 1), and the probability of an event with the measure of the corresponding set of t’s gives an analytic model for successive Bernoulli trials.
  6. Almost every number t has asymptotically the same number of 0’s and 1’s in it binary expansion. That is, we say that the binary expansion is normal.




  1. The function sin xx is often called the sinc function. The sinc function sinc(x), also called the “sampling function”, arises frequently in signal processing and the theory of Fourier transforms.
  2. Every t with 0 t 1 has a unique binary expansion,
    t = 𝜖1(t) 2 + 𝜖2(t) 22 + 𝜖3(t) 23 +

  3. The Rademacher functions are rk(t) = 1 2𝜖k(t),
  4. If t has asymptotically the same number of 0s and 1s in its binary expansion, the binary expansion is normal.


Mathematical Ideas

Mathematical Ideas

Vieta’s Formula from Trigonometry

The following is due both to Vieta in 1593 and also to Euler.

Theorem 1. The sequence of products

cos(x2), cos(x2) cos(x4), cos(x2) cos(x4) cos(x8),

converges uniformly on bounded intervals to

k=1 cos x 2k = sin x x .

Remark. The function sin xx is often called the sinc function. The cosine product definition shows that sinc(0) = 1 while sin xx is undefined at x = 0. However, lim x0 sin(x)x = 1 so sinc(x) is continuous at 0. In the sequel the implicit assumption is that sin xx is 1 at x = 0 even though technically the expression is undefined. The sinc function sinc(x), also called the “sampling function”, is a function that arises frequently in signal processing and the theory of Fourier transforms. The full name of the function is “sine cardinal”, but it is commonly referred to by its abbreviation, “sinc”.

Remark. The following proof is remarkably elementary for such an unusual identity. The proof combines repeated application of the half-angle identity for the cosine with an elementary limit of the sine function. The proof of uniform convergence uses the Cauchy criterion. The verification of the criterion uses an elementary bound on the cosine function from the series expansion.


  1. Start with repeated application of the half-angle formula from trigonometry: sin(x) = 2 sin x 2 cos x 2 = 22 sin x 4 cos x 4 cos x 2 = 23 sin x 8 cos x 8 cos x 4 cos x 2 = 2n sin x 2n k=1n cos x 2k .
  2. A standard limit from elementary calculus shows that for x0
    1 = lim n sin(x2n) x2n = 1 x lim n2n sin x 2n ,


    x = lim n2n sin x 2n .

  3. Alternatively, use the series expansion for sin x 2n sin x 2n = 2n k=1x2k12n(2k1) (2k 1)! = x + k=2x2k12n(2k1) (2k 1)!

    and then let n . Details are left to the reader.

  4. Putting the limits in step 1 and step 2 together
    sin x x = k=1 cos x 2k .

  5. For the uniform convergence, consider first the bounded interval [π,π]. Then let fn(x) = k=1n cos x 2k and note that fn(x) 0 on [π,π] with fn(π) = fn(π) = 0. See Figure 1.
  6. Each fn(x) is continuous, the sequence converges to the continuous function sin(x)x on the closed bounded (compact) interval [π,π], and fn(x) fn+1(x) on [π,π] since cos(x2n+1) 1. By Theorem 7.13 of [2], the convergence is uniform for the monotone limit of continuous functions to a continuous function on a compact interval.
  7. Note that fn(kπ) = fn((k 1)π) = fn((k 1)π) = fn(kπ) = 0 for n k. Furthermore (1)kf n(x) 0 for n k. See Figure 1 for an example with k = 1 and n = 2.
  8. Then the previous step can be repeated on intervals of the form [kπ, (k + 1)π] or [(k + 1)π,kπ] to establish uniform convergence on those intervals. By symmetry, the threshold K(k,𝜖) for uniform bound 𝜖 on [kπ, (k + 1)π] and [(k + 1)π,kπ] will be the same.
  9. For any bounded interval with |x| M, choose k so large that M kπ and use the previous steps to find a uniform bound for uniform convergence on the finite collection of intervals [(j + 1)π,jπ] [jπ, (j + 1)π] for k = 1,,k together with [π,π].

Convergence of products

Figure 1: Convergence of the products to sin xx. The red curve is cos(x2) and the blue curve is cos(x2) cos(x4).

A special case is the basis for an unusual formula involving π due to Vieta in a book published in 1593. First, a simple trigonometric lemma.

Lemma 2.

cos π 2n+1 = 2 + 2 + 2 + + 2 2  (n roots) 


  1. The proof is by mathematical induction.
  2. The base case for n = 1 is cos π 4 = 2 2 , verifying the base case.
  3. Given that the equality holds for n = k, let
    α = 2 + 2 + 2 + + 2  (k roots)

    so that the assumption is that

    cos π 2k+1 = α 2 .

  4. Then for n = k + 1, cos π 2k+2 = cos π2k+1 2 = 1 + α2 2 = 2 + α 4 = 2 + 2 + 2 + + 2 2  (k + 1 roots)

    establishing the induction.

Corollary 1.

2 π = 2 2 2 + 2 2 2 + 2 + 2 2

Proof. Set x = π2 and apply the lemma to the infinite product formula for sin xx from the theorem. □

A numerical evaluation of successive finite products is in Table 1, illustrating the convergence.

Table 1: Numerical example of Vieta’s formula.
2 π 1 2 3 4 5

0.6366198 0.70710680.65328150.64072890.63764360.6368755

Binary Expansions and Rademacher Functions

Every t with 0 t < 1 has a unique binary expansion,

t = 𝜖1(t) 2 + 𝜖2(t) 22 + 𝜖3(t) 23 + (1)

using the convention that terminating expansions have all digits equal to 0 from a certain point. For example, write

3 4 = 1 2 + 1 22 + 0 23 + 0 24 +

rather than

3 4 = 1 2 + 0 22 + 1 23 + 1 24 + .

With the convention about terminating expansion, the graphs of 𝜖1(t), 𝜖2(t), 𝜖3(t), …are as in Figure 2.

Bit functions for binary expansions

Figure 2: The bit functions for binary expansions in [0, 1).

It is more convenient to use the Rademacher functions defined by rk(t) = 1 2𝜖k(t), with graphs as in Figure 2. Note the similarity to the random variables Xk = 0, 1 and Y k = 1 2Xk = ±1.

Rademacher functions

Figure 3: The Rademacher functions.

In terms of the Rademacher functions, the binary expansion (1) becomes

1 2t = k=1r(t) 2k .

Remark. Note that an alternative description of the Rademacher functions is

rk(t) = sgn(sin(2π2k1t)).

In this sense, the Rademacher functions are a discretized version of the sine functions. The Rademacher function expansion follows directly from binary expansion and not from an orthogonal basis.

Theorem 3.

01 k=1 exp ixrk(t) 2k dt = k=101 exp ixrk(t) 2k dt

Remark. This is remarkable and unusual, since it says an integral of a product is a product of integrals. The proof combines the expansion of 1 2t in Rademacher functions, elementary integration, the expression of sin x and cos x in terms of complex exponentials, and Vieta’s formula.

Remark. On the other hand, from an advanced probability point of view, it may not be so remarkable after all. Consider rk(t) as a random variable over the probability space [0, 1). Although the left side is a product of exponentials, it could be written as the exponential of a sum k=1r k(t) of random variables. The left side is then the characteristic function (or re-scaled Fourier transform) of the sum of random variables. By a well-known correspondence, the characteristic function of a sum is equal to the product of individual characteristic functions. Characteristic functions transform questions about sums of random variables and convergence of random variables into analytic questions of products and pointwise convergence of functions. In this way, this theorem points the way to the analytic model of coin-flipping below.


  1. Let u = 1 2t, so du = 2dt and 01eix(12t)dx =11eixudu 2 = 1 2 eix eix ix = sin x x .
  2. On the other side 01 exp ixrk(t) 2k dt = 2k1 eix2k 2k + 2k1 eix2k 2k = cos x 2k .
  3. Then Vieta’s formula
    sin x x = k=1 cos x 2k .


    01eix(12t)dt = k=101 exp ixrk(t) 2k dt

  4. Then substituting
    1 2t = k=1r(t) 2k .

    and expressing the sum in the exponent as a product of exponentials

    01 k=1 exp ixrk(t) 2k dt = k=101 exp ixrk(t) 2k dt.

Another proof of Vieta’s formula

Consider the function k=1nc krk(t). It is a step function that is constant on the intervals j 2n, j+1 2n for j = 0, 1, 2,, 2n 1. The values of the function are ±c1 ± c2 ± c3 ± ± cn. Every sequence of length n of +1s and 1s corresponds to exactly one interval j 2n, j+1 2n . There are 2n such intervals and the sequence of ±1 corresponds to the sequence of subintervals of length 2j with j n that j 2n, j+1 2n is in. Then

01 exp i k=1nc krk(t) dt = 1 2n 2n exp i 1n ± c k , (2)

and the lower limit on the summation indicates that the summation over each of the 2n possible subsequences of ±1. By turning a sum of powers into a product

1 2n exp i k=1n ± c k = k=1n eick + eick 2 = k=1n cos(c k). (3)

Putting together (2) and (3)

01 exp i k=1nc krk(t) dt = k=1n cos(c k) = k=1n01 exp ic krk(t) dt. (4)

where the proof of the last equality is the same as in step 2 of the proof of Theorem 3.

Now set ck = x2k to get

01 exp k=1nixrk(t) 2k dt = k=0n cos x 2k .


lim n k=1rk(t) 2k

converges uniformly in (0, 1) to 1 2t by the Weierstrass M-criterion. Therefore we can interchange limit and integral to see that

sin(x) x =01eix(12t)dt = lim n01 exp ix k=1nrk(t) 2k dt = lim n k=1n cos x 2k = k=1 cos x 2k

This is a different proof of Vieta’a formula connecting the formula to binary representations of real numbers in [0, 1).

Measure from Rademacher Functions

Let μ[E] be the length, or more formally, the measure of the subset E of [0, 1). If δ1,δ2,,δn is a sequence of +1’s and 1’s then

μ[r1 = δ1,r2 = δ2,,rn = δn] = μ[r1 = δ1] μ[r2 = δ2],μ[rn = δn].

Example. Let δ1 = +1, δ2 = 1, δ3 = 1. Then t : δ1 = +1 = [0, 12), t : δ2 = 1 = [14, 12) [34, 1), t : δ3 = 1 = [18, 28) [38, 48) [58, 68) [78, 88). Consider the set E = t : δ1 = +1,δ2 = 1,δ3 = 1 = [38, 48). Then

1 8 = μ[r1 = δ1,r2 = δ2,r3 = δ3] = μ[r1 = δ1]μ[r2 = δ2]μ[r3 = δ3] = 1 2 1 2 1 2.

This gives another way to rewrite the proof of (4) that is basically the same as before:

01 exp i k=1nc krk(t) dt = δ1,,δn exp i k=1nc kδk μ[r1 = δ1] μ[r2 = δ2],μ[rn = δn] = δ1,,δn k=1neickδk k=1nμ[r k = δk] = δ1,,δn k=1neickδk μ[rk = δk] = k=1n δkeickδk μ[rk = δk] = k=101 exp ic krk(t) dt.

Analytic Model of Coin Tossing

The following correspondence gives an analytic model of successive Bernoulli trials. That is, consider the following probability scenario. A fair coin is successively tossed n times, with independence of tosses, coming up Heads or Tails on each toss. Use Table 2 to interpret this physical probability experiment analytically

Coin tossing Analytic Model

Symbol H + 1
Symbol T 1
kth toss rk(t)
Event set of t’s in (0, 1)
Probability of an event Measure of the corresponding set of t’s

Table 2: Correspondence of terms in coin tossing and the analytic model.

Example. Consider an analytic model for the simplest probability problem for Bernoulli trials: Find the probability that in n independent tosses of a fair coin, exactly l will be heads. Using the table to translate the problem to analytic terms, the problem becomes: Find the measure of the set of t’s such that exactly l of the n numbers r1(t), r2(t), …, rn(t) are equal to +1.

To solve this problem start by noticing that having exactly l of n of the rk(t) being +1 means that n l are 1, so that

r1(t) + r2(t) + + rn(t) = l (n 1) = 2l n. (5)

Second, notice that

1 2π02πeimxdx = 1 2π eim2π im e0 im = 0

for m0 and so

1 2π02πeimxdx = 1m = 0 0 m0


ϕ(t) = 1 2π02πeix(r1(t)+r2(t)++rn(t)(2ln))dx

will equal 1 on the set of t’s satisfying the condition (5), and is equal to 0 otherwise. (Pay careful attention to the variable of integration.) Therefore

μ[r1(t) + r2(t) + + rn(t) = l (n l) = 2l n] =01ϕ(t)dt =01 1 2π02πeix(r1(t)+r2(t)++rn(t)(2ln))dxdt = 1 2π02πei(2ln)x 01eix(r1(t)+r2(t)++rn(t))dtdx.

The interchange of order of integration is usually justified by Fubini’s Theorem, but that is not actually necessary here, since r1(t) + r2(t) + + rn(t) is a step function. Now use (3) on the inner integral with c1 = c2 = = cn = x to obtain

μ[r1(t) + r2(t) + + rn(t) = 2l n] = 1 2π02πei(2ln)x cos n(x)dx.

Evaluating this integral is in the problems, the result of the evaluation gives

μ[r1(t) + r2(t) + + rn(t) = 2l n] = 1 2n n l .

Discussion of Independence of Events

The natural mathematical modeling assumption for probability is that events that seem unrelated are probabilistically independent. That is, the joint probability of unrelated events is the product of the individual probabilities. The product rule is not a mathematical necessity. Rather it is a modeling rule based on experiment and practical experience, justifying the multiplication of probabilities. Thus, probabilistic independence is an intuitive notion with the sense that the multiplication rule is applicable and useful.

In a landmark 1909 paper, “Sur les probabilités dénombrables et luer applications arithmétiques” E. Borel showed that binary digits, or equivalently the Rademacher functions, are independent in the sense that

01 k=1 exp ixrk(t) 2k dt = k=101 exp ixrk(t) 2k dt.

This observation gives well-defined mathematical objects to which the postulates of probability theory apply directly.

Weak and Strong Laws

The application of the postulates of probability to Rademacher functions eliminates casting probability in terms of coins and tosses. All of the previous proofs of theorems for coin-tossing have equivalents using Rademacher functions, or equivalently binary digits, with Table 2. The Weak Law of Large Numbers, the direct Borel-Cantelli Lemma and the Strong Law of Large Numbers serve as examples.

Lemma 4 (Orthogonality of Rademacher Functions). If k1 < k2 < < kn,

01r k1(t)rk2rkn(t)dt = 0.

Remark. Consider the example in Figure 4.

Product of Rademacher functions

Figure 4: Graph of the product of Rademacher functions.

Each of the integrals over subintervals [0, 14], [14, 12], [12, 34] and [34, 1] are like a rescaled version of ±01r 1(t)dt = 0, so the entire integral is 0. The proof expands that idea using induction on the number of factors in the product.


  1. The proof is by induction on the number of factors n in the product.
  2. The base case for n = 1 is obvious by the symmetry and regularity of rk(t):
    01r k1(t)dt.

  3. Suppose that the conclusion is true for n 1 factors,
    01r k2(t)rk3(t)rkn(t)dt = 0

    and consider

    01r k1(t)rk2(t)rkn(t)dt.

  4. Break the integral into 2k1 integrals over the subintervals [2k1, ( + 1)2k1], = 0,, 2k 1.
    01r k1(t)rk2(t)rkn(t)dt = =02k11(1)2k1(+1)2k1 rk2(t)rk3(t)rkn(t)dt.

  5. For each integral, change variables t = 2k1 + u 2k1 for u [0, 1]. Then the summation of integrals becomes
    =02k11(1) 2k1 01r k2 2k1 + u 2k1 rk3 2k1 + u 2k1 rkn 2k1 + u 2k1 du.

  6. Using the recursive relation that
    rki 2k1 + u 2k1 = rkik1(u)

    the summation becomes

    =02k11(1) 2k1 01r k2 u rk3 u rkn u du.

  7. Each summand is 0 by the induction hypothesis, so the sum is 0 and the induction step is complete.

Theorem 5 (Weak Law of Large Numbers). For every 𝜖 > 0,

lim nμ[|r1(t) + + rn(t)| > 𝜖n] = 0

Remark. The proof is essentially the standard one using Chebyshev’s inequality expressed directly in terms of the measure and integral of Rademacher functions.


  1. On the one hand, 01(r 1(t) + + rn(t))2dt |r1(t)++rn(t)|>𝜖n(r1(t) + + rn(t))2dt > 𝜖2n2μ[|r 1(t) + + rn(t)| > 𝜖n].

  2. On the other hand, using the Lemma on Orthogonality of Rademacher Functions,
    01(r 1(t) + + rn(t))2dt = n.

  3. Combining the two previous points,
    μ[|r1(t) + + rn(t)| > 𝜖n] < 1 𝜖2n


    lim nμ[|r1(t) + + rn(t)| > 𝜖n] = 0.

Lemma 6. If fn(t) is a sequence of non-negative Lebesgue integrable functions, then convergence of n=1 01f n(t)dt implies

lim nfn(t) = 0

almost everywhere.

Remark (Borel-Cantelli Direct Half). The proof is a direct translation of the probabilistic proof of the direct Borel-Cantelli Lemma, using an analytic version of Markov’s inequality.


  1. Let an arbitrary a > 0 be given.
  2. Since fn(t) 0 01f n(t)dt =t:fn(t)<afn(t)dt +t:fn(t)afn(t)dt aμ[fn(t) a].
  3. Therefore n=1 01f n(t)dt < implies n=1aμ[f n(t) a] < .
  4. In turn, that implies μ[fn(t) a] 0 as n .
  5. Since this is true for any a > 0,
    lim nfn(t) = 0

    almost everywhere.

Theorem 7 (Strong Law of Large Numbers). For almost every t (that is, for all t except for a set of measure 0)

lim nr1(t) + + rn(t) n = 0.

Remark. This proof resembles the third proof of the Strong Law in Strong Law of Large Numbers.. The proof substitutes the analytic form of the direct half of the Borel-Cantelli Lemma to show the convergence almost everywhere.


  1. Set
    fn(t) = r1(t) + + rn(t) n 4

    and consider

    01f n(t)dt =01 r1(t) + + rn(t) n 4dt.

  2. Using the Orthogonality of Rademacher Functions
    01 r1(t) + + rn(t) n 4dt = n + 4 2 n 2 n4 C n2

    since the only terms that contribute positive values are the fourth powers and the pairs of squares.

  3. Therefore
    n=101f n(t)dt < .

  4. The proof now uses the preceding lemma on the analytic form of the Borel-Cantelli Lemma. With this lemma
    lim nr1(t) + + rn(t) n 4 = 0

    almost everywhere, and so

    lim nr1(t) + + rn(t) n = 0.

    almost everywhere.

Remark. Recall that rk(t) = 1 2𝜖k(t) and that 𝜖k(t) is the kth bit in the binary expansion of t. Then the Strong Law of Large Numbers implies that for almost every t

lim n𝜖1(t) + + 𝜖n(t) n = 1 2.

In other words, almost every number t has asymptotically the same number of 0’s and 1’s. That is, we say that the binary expansion is normal.


This section is adapted from: Statistical Independence in Probability, Analysis, and Number Theory by Mark Kac, 1959, Mathematical Association of America, pages 1–35. Problems 2 and 3 are from the same source. The Cauchy criterion for uniform convergence is Theorem 7.8, page 134 in Principles of Mathematical Analysis second edition by W. Rudin, McGraw-Hill, 1964. The remarks about the sinc function are from Mathworld: Sinc function.


Algorithms, Scripts, Simulations

Algorithms, Scripts, Simulations




Problems to Work

Problems to Work for Understanding

  1. Show by direct integration that
    1 2π02πei(0)x cos 0(x)dx = 1


    1 2π02πei(201)x cos(x)dx = 12.


    1 2π02πei(211)x cos(x)dx = 12.

  2. For n 2 and l n, show that
    1 2π02πei(2ln)x cos n(x)dx = 1 2n n l .

  3. Every t with 0 t < 1 has a unique ternary expansion,
    t = η1(t) 2 + η2(t) 22 + η3(t) 23 + (6)

    where ηk(t) = 0, 1 or 2. Prove that ηi(t) is independent of ηj(t).

  4. Prove that
    sin(x) x = n=11 + 2 cos (2x)3k 3

  5. For 0 < p < 1 let
    Tp(t) = tp, 0 t p (t p)(1 p)p < t t,

    and let

    𝜖p(t) = 10 t p, 0,p < t 1.

    Plot the functions 𝜖1(p)(t) = 𝜖 p(t), 𝜖2(p)(t) = 𝜖 p(Tp(t)), 𝜖3(p)(t) = 𝜖 p(Tp(Tp(t))) and so on, and show that they are independent. This is the basis for an analytic model of the unfair coin. See the next problem.

  6. Prove that the measure of the set on which
    𝜖1(p) + 𝜖 2(p) + + 𝜖 n(p) = l

    where 0 l n is

    n l pl(1 p)nl.



Reading Suggestion:


[1]   Mark Kac. Statistical Independence in Probability, Analysis and Number Theory, volume 12 of The Carus Mathematical Monographs. Mathematical Association of America, 1959.

[2]   Walter Rudin. Principles of Mathematical Analysis. McGraw-Hill, 1964.



Outside Readings and Links:


I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar’s Home Page,

Email to Steve Dunbar, sdunbar1 at unl dot edu

Last modified: Processed from LATEX source on June 18, 2018