Steven R. Dunbar
Department of Mathematics
203 Avery Hall
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar

__________________________________________________________________________

Distinguishing A Biased Coin From a Fair Coin

_______________________________________________________________________

Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See http://mathjax.org for details on supported browsers, accessibility, copy-and-paste, and other features.

_______________________________________________________________________________________________

### Rating

Mathematically Mature: may contain mathematics beyond calculus with proofs.

_______________________________________________________________________________________________

### Section Starter Question

You are presented with two coins: one is fair and the other has a $60%$ chance of coming up heads. Unfortunately, you don’t know which is which. How many ﬂips would you need to perform in parallel on the two coins to give yourself a $95%$ chance of correctly identifying the biased coin?

_______________________________________________________________________________________________

### Key Concepts

1. How to numerically compute the probability of a majority of heads in a sequence of paired coin ﬂips of a biased coin and a fair coin.
2. How to numerically compute the probability of statistical evidence of a biased coin in a sequence of paired coin ﬂips of a biased coin and a fair coin.

__________________________________________________________________________

### Vocabulary

1. A sequence of independent Bernoulli trials with probability $1∕2$ of success on each trial is metaphorically called a fair coin. One for which the probability is not $1∕2$ is called a biased or unfair coin.

__________________________________________________________________________

### Mathematical Ideas

#### Introduction

The following problem appeared in the FiveThirtyEight.com weekly Riddler puzzle column on September 29, 2017:

On the table in front of you are two coins. They look and feel identical, but you know one of them has been doctored. The fair coin comes up heads half the time while the doctored coin comes up heads 60 percent of the time. How many ﬂips you must ﬂip both coins at once, one with each hand would you need to give yourself a 95 percent chance of correctly identifying the doctored coin?

Extra credit: What if, instead of 60 percent, the doctored coin came up heads $p$ some percent of the time? How does that aﬀect the speed with which you can correctly detect it?

This problem appeared in a paper “What’s Past is Not Prologue” by James White, Jeﬀ Rosenbluth, and Victor Haghani. They in turn were inspired by the problem posed in a paper “Good and bad properties of the Kelly criterion” by MacLean, Thorp and Ziemba.

Solving this problem requires some interpretation and computation.

#### Probability of a Majority of Heads

In any ﬁxed number of simultaneous ﬂips, there is always a chance that the fair coin will have more heads than the biased coin. But the Weak Law of Large Numbers says that the probability that the biased coin will have a majority of heads increases to $1$ as the number of ﬂips increases. One way to determine which is the biased coin is to choose the coin which has a majority of heads. The authors White, Rosenbluth, and Haghani want to calculate how many ﬂips of $2$ coins, one biased and one fair, we must observe in order to be $95%$ conﬁdent that the coin with more heads is the biased coin.

Since each coin has a binomial distribution and the coins are assumed independent, the joint probability mass distribution after $n$ ﬂips is the product of the individual binomial probability mass distributions. Denote by $Q\left(n;j,k\right)$ the probability of $j$ heads for the fair coin and $k$ heads for the biased in $n$ ﬂips. Use $p$ for the probability of heads of the biased coin and $\frac{1}{2}$ for the probability of heads for the fair coin. Then

$Q\left(n;j,k\right)=\left(\genfrac{}{}{0.0pt}{}{n}{j}\right){\left(\frac{1}{2}\right)}^{j}{\left(\frac{1}{2}\right)}^{n-j}\cdot \left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p}^{k}{\left(1-p\right)}^{n-k}.$

Then summing over values where $j gives the probability that the biased coin has more heads than the fair coin in $n$ ﬂips

$ℙ\left[j

There is no closed formula to evaluate this sum so calculation is necessary. First create two vectors of length $n+1$ with the binomial distribution on $0$ to $n$ with probability $\frac{1}{2}$ for the fair coin and with probability $p$ for the biased coin. Then using the outer product of these two vectors create the bivariate binomial distribution for the two coins. This will be an $\left(n+1\right)×\left(n+1\right)$ matrix. The element in row $i$ and column $j$ is the product of the binomial probability of $i-1$ heads for the fair coin and the binomial probability of $j-1$ heads for the biased coin. The sum is over column indices strictly greater than the row indices. To create the sum, set the lower triangular part of the matrix and the diagonal of the matrix to $0$ and use the sum command to sum all entries of the matrix. This seems to be eﬃcient, even for values of $n$ up to $200$. To ﬁnd the minimal value of $n$ for which this probability is greater than $0.95$ use binary search over a reasonable interval.

It might seem possible to use a bivariate normal approximation of the bivariate binomial distribution to calculate the probability. While there is such an approximation, the double integration of the bivariate normal with a non-zero covariance would be over a region of the form $y>x-𝜖$. There is no direct way with R to calculate the integral over a region of this form, so it is actually easier to calculate with the bivariate binomial distribution.

The result is that it takes $143$ ﬂips of the two coins for the probability to be greater than $0.95$ for the $0.6$ biased coin to have more heads than the fair coin.

The same analysis for various probabilities of heads $p$ for the biased coin and for values $0.95$, $0.9$ and $0.75$ of certainty is in Figure 1. The number of ﬂips required decreases as the bias $p$ increases, as expected. The number of ﬂips required also decreases as the certainty decreases.

#### Determining which coin is biased

Assume that the biased coin is in the left hand. Let ${L}_{j}$ be the result of left-hand coin ﬂip $j$, knowing it is biased so ${L}_{j}=\text{Head}=1$ with probability $0.6$, ${L}_{j}=\text{Tail}=0$ with probability $0.4$. Let ${R}_{j}$ be the result of right-hand coin ﬂip $j$, knowing it is fair so ${R}_{j}=\text{Head}=1$ with probability $0.5$, ${R}_{j}=\text{Tail}=0$ with probability $0.5$. Let ${X}_{j}={L}_{j}-{R}_{j}$. This is a trinomial random variable with ${X}_{j}=-1$ with probability $1∕5=0.2$, ${X}_{j}=0$ with probability $1∕2=0.5$, ${X}_{j}=1$ with probability $3∕10=0.3$.

Consider the statistics of ${X}_{j}$,

$\begin{array}{llll}\hfill 𝔼\left[{X}_{j}\right]& =\left(-1\right)\cdot \left(0.2\right)+0\cdot \left(0.5\right)+\left(+1\right)\cdot \left(0.3\right)=0.1,\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill Var\left[{X}_{j}\right]& ={\left(-1\right)}^{2}\cdot \left(0.2\right)+{\left(0\right)}^{2}\cdot \left(0.5\right)+{\left(+1\right)}^{2}\cdot \left(0.3\right)-{\left(0.1\right)}^{2}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =0.49,\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \sigma \left[{X}_{j}\right]& =0.7.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}& \hfill \end{array}$

Let ${S}_{n}=\frac{1}{n}\sum _{j=1}^{n}{X}_{j}$ be the sample mean. Then ${S}_{n}$ is distributed on $-1$ to $1$ by increments of $1∕n$ for a total of $2n+1$ points with

Thus, the distribution of ${S}_{n}$ clusters around $1∕10$ with standard deviation $7∕\left(10\sqrt{\left(}n\right)\right)$. The goal is ﬁnd a number of ﬂips such that the probability that ${S}_{n}$ is closer to $0.1$ than $0.2$ with probability $0.95$. This would be enough to clearly distinguish the mean from the possible alternative which is $-0.1$ coming from the biased coin in the right hand.

By the Chebyshev inequality

$ℙ\left[|{S}_{n}-\mu |\ge k\right]\le {\sigma }^{2}∕{k}^{2}.$

Here the values for the Chebyshev inequality are

$ℙ\left[|{S}_{n}-1∕10|\ge 0.2\right]\le 0.05.$

Taking $k=2∕10=0.2$, if $n$ is large enough then ${\sigma }^{2}∕{k}^{2}=\left(0.49∕n\right)∕{\left(2∕10\right)}^{2}\le 0.05=1∕20$. Solving for $n$, gives $n>245$ however Chebyshev is notoriously weak, so this is more than actually necessary.

Expressing the precise distribution of ${S}_{n}$ analytically is diﬃcult. So instead, use a numerical calculation of the distribution. To numerically calculate the distribution of the sample mean ${S}_{n}$, use the R package distr and speciﬁcally the function convpow which takes the $n$-fold convolution power of a distribution to create the distribution of the $n$-fold sum of a random variable. Note that mathematically the support of the distribution of, for instance ${S}_{100}$, would be from $-1$ to $1$, with $201$ points. However, the actual calculated distribution support of, for instance, ${S}_{100}$, is $111$ points from $-46$ to $64$. The reason is that points with probability less than $1{0}^{-16}$ are ignored, so are not included in this domain. So use match(-10, (support(D100))) to ﬁnd the index of $-0.1$ (namely $-0.1\cdot 100=-10$). This turns out to be index $37$. So summing the distribution over indices from $37+1$ to $111$ gives the probability that the random variable ${S}_{n}$ is bigger than $-0.1$. Recall that $0.1$ is the mean value if the biased coin is in the right hand. Searching for a value of $n$ large enough that the probability exceeds $0.95$ gives the required number of ﬂips.

Using this algorithm, the necessary number of ﬂips to distinguish a biased coin with a probability of heads $0.6$ from a fair coin with a certainty level of $0.95$ is $31$. The same analysis for various probabilities of heads $p$ for the biased coin and for values $0.95$, $0.9$ and $0.75$ of certainty is in Figure 2. The number of ﬂips required decreases as the bias increases as expected. The number of ﬂips required also decreases as the certainty decreases as expected.

There is an interesting observation for values of $p>0.75$ and all values of certainty. According to Figure 2, it only takes $2$ ﬂips of the coins to identify the biased coin for these parameters. That is because the probability distribution of ${S}_{2}=\frac{1}{2}\left[\left({L}_{1}-{R}_{1}\right)+\left({L}_{2}-{R}_{2}\right)\right]$ is in Table 1. The probability that ${S}_{2}$ is greater than $-0.1$ is $0.343750+0.375000+0.140625=0.863125$.

Table 1: Probability distribution of ${S}_{2}$ for $p=0.75$.
 $-1$ $-\frac{1}{2}$ $0$ $\frac{1}{2}$ $1$ $0.015625$ $0.125000$ $0.343750$ $0.375000$ $0.140625$

#### Diﬀerences between the Two Calculations

The two values of $n$ required are strikingly diﬀerent. Why is that so?

The calculation of the probabilities uses fundamentally the same information, as the diagram in Figure 3 for the case $n=10$ illustrates. The $11×11$ array of dots represents the support of the bivariate binomial distribution as a matrix, with rows from $0$ to $11$ for the number of heads from the fair coin and columns $0$ to $11$ for number of heads for the biased coin.

The probability that the majority of ﬂips is from the biased coin is the sum of the probabilities in the strict upper triangle of the support of the bivariate distribution. The shaded portion of Figure 3 shows this domain.

The multinomial probability distribution on $-10$ to $10$ of the $10$-fold sum $\sum _{j=0}^{10}\left({L}_{j}-{R}_{j}\right)$ is the sum of the probabilities along diagonals of the bivariate binomial distribution as indicated by the red arrows. In the case the biased coin has $p=0.6$ so the mean of ${L}_{n}-{R}_{n}=0.1$, the probability of the event

$\begin{array}{llll}\hfill \left[{S}_{n}>\frac{-1}{10}\right]& =\left[\frac{1}{n}\sum _{j=1}^{n}\left({L}_{j}-{R}_{j}\right)>\frac{-1}{10}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\left[\sum _{j=1}^{n}\left({L}_{j}-{R}_{j}\right)>\frac{-n}{10}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

is the sum of the probabilities in the strict upper triangle and additionally the main diagonal and the subdiagonals from $⌈\frac{-n}{10}⌉$ to $0$. Thus the total probability is larger for the statistical calculation and for a given level of certainty to exceed the certainty the required number of ﬂips is less.

The choice of decision criterion is motivated by the application. The original problem posed in FiveThirtyEight.com was drawn from a white paper by James White, Jeﬀ Rosenbluth, and Victor Haghani from Elm Partners Investing. In turn their example is motivated by a paper “Good and bad properties of the Kelly criterion” by McLean, Thorp and Ziemba. The question is a simpliﬁed and idealized version of an investment question about how much evidence is required to select a higher performing investment, modeled by a biased coin, from a lower performing investment with just an even chance at making a proﬁt, modeled by a fair coin. In that case, the $95%$ certainty of a majority of gains is a reasonable choice of criterion. However, if the question is merely to identify the biased coin, then it makes sense to use the Central Limit Theorem to distinguish the mean given that the biased coin is in the left hand from the mean given that the biased coin is in the right hand. That is, the probability distribution of the sample mean “condenses” around the expected value of the diﬀerence with smaller variance as the number of ﬂips increases. Some of the probability may even come from cases where there are a more heads produced by the fair coin but the preponderance of evidence indicates the biased coin. Thus it takes a smaller number of ﬂips to conﬁdently identify the biased coin.

#### Sources

The deﬁnition of biased coin is from Wikipedia. The problem appeared in the FiveThirtyEight.com weekly Riddler puzzle column on September 29, 2017. This ﬁrst subsection with the majority solution is adapted from a white paper “What’s Past is Not Prologue” by James White, Jeﬀ Rosenbluth, and Victor Haghani from Elm Partners Investing. In turn, their example is motivated by a paper “Good and bad properties of the Kelly criterion” by McLean, Thorp and Ziemba. The second subsection with the statistical solution is original.

_______________________________________________________________________________________________

### Algorithms, Scripts, Simulations

#### Scripts

R
1prob <- function(n, p) {
2    fair <- dbinom(0:n, n, 0.5)         #fair binomial distribution
3    biased <- dbinom(0:n, n, p)         #biased binomial dist
4    bivarBinom <- fair %o% biased       #outer product, (n+1) x (n+1) matrix
5    bivarBinom[ lower.tri(bivarBinom, diag=TRUE) ] <- 0
6    # retain only the upper triangular values, without diagonal
7    sum( bivarBinom )                   #add them all up
8}
9
10reqN <- c()                             #initialize vector to hold results
11for (certainty in c(0.95, 0.9, 0.75)) { #three representative values
12    for (p in seq(0.55, 0.99, length=45)) { #array of bias values
13        low <- 2                            #minimum to start search
14        high <- 1000                        #arbitrary guess for max for search
15
16        while (high - low > 1) {
17            try <- floor( (low + high)/2 ) #binary search
18            if ( prob(try, p) <= certainty ) low <- try + 1 else high <- try
19            ## uses monotone increasing nature of probability
20        }
21
22        reqN <- c(reqN, low)            #add search result to vector of results
23    }
24}
25
26N <- matrix(reqN, 45,3)                 #reformat to three columns, one for each certainty
27
28ps = seq(0.55, 0.99, length=45)         #array of bias values to plot on horiz
29plot( ps, N[,1], pch=20, xlab="p", ylab="N") #plot required N for certainty 0.95
30points( ps, N[,2], pch=20, col="red")        #add plot for required N for 0.90
31points( ps, N[,3], pch=20, col="blue")       #add plot for required N for 0.75
4cAp1x1-1300032:
1
2library(distr)
3
4prob <- function(nn, p) {
5    D1 <- DiscreteDistribution( -1:1, c((1-p)/2, 1/2, p/2) )
6    ## trinomial distribution of X_n = L_n - R_n
7    Dnn <- convpow(D1, nn)
8    ## distribution of sum_{j=1}^n X_n
9    end0 <- NROW( (support(Dnn)) )
10    ## last non-zero support point
11    start0 <- match( floor(-(p-1/2)*nn)+1, (support(Dnn)) )
12    ##  non-zero support point close to mean (p-1/2)*nn + 1
13    distDnn <- d(Dnn)(support(Dnn))
14    ## extract vector of distribution values on [start0, end0]
15    sum( distDnn[start0:end0] )         #add up the probability
16}
17
18reqN <- c()                             #initialize vector to hold results
19
20for (certainty in c(0.95, 0.9, 0.75)) { #three representative value
21    try <- 150                          #place to start search
22    for (p in seq(0.55, 0.99, length=45)) { #array of bias values
23        low <- 1                            #minimum to start search
24        high <- try                         #a maximum to start with
25        ## For greater efficiency, note that the required number of flips
26        ## for the next p  will be less than the number of flips for previous
27        ## p, so restrict search to interval n=2 to n=try from previous search
28        while (high - low > 1) {
29            try <- round( (low + high)/2 ) #binary search
30            if ( prob(try, p) <= certainty ) low <- try  else high <- try
31        }
32
33        reqN <- c(reqN, high)            #add search result to vector of results
34    }
35}
36
37N <- matrix(reqN, 45,3)                 #reformat to three columns, one for each certainty
38
39ps = seq(0.55, 0.99, length=45)         #array of bias values to plot on horiz
40plot( ps, N[,1], pch=20, xlab="p", ylab="N") #plot for required N for certainty 0.95
41points( ps, N[,2], pch=20, col="red")        #add plot for required N for 0.90
42points( ps, N[,3], pch=20, col="blue")       #add plot for required N for 0.75
4cAp2x1-1300046:

__________________________________________________________________________

### Problems to Work for Understanding

__________________________________________________________________________

### References

[1]   Leonard C. MacLean, Edward O. Thorp, and William T. Ziemba. Long-term capital growth: the good and bad properties of the Kelly and fractional Kelly capital growth criteria. Quantitative Finance, 10(7):681–687, 2010.

[2]   James White, Jeﬀ Rosenbluth, and Victor Haghani. What’s past is Not prologue. https://ssrn.com/abstract=3034686, April 2017.

__________________________________________________________________________

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable eﬀort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reﬂects the thoughts, interests and opinions of its author. They do not explicitly represent oﬃcial positions or policies of my employer.

Information on this website is subject to change without notice.