Steven R. Dunbar
Department of Mathematics
203 Avery Hall
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar

__________________________________________________________________________

Distinguishing A Biased Coin From a Fair Coin

_______________________________________________________________________

Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See http://mathjax.org for details on supported browsers, accessibility, copy-and-paste, and other features.

_______________________________________________________________________________________________ ### Rating

Mathematically Mature: may contain mathematics beyond calculus with proofs.

_______________________________________________________________________________________________ ### Section Starter Question

You have with two coins: one is fair and the other has a biased chance $p>\frac{1}{2}$ of coming up heads. Unfortunately, you don’t know which is which. How can you determine which is the biased coin? What do you need to know?

_______________________________________________________________________________________________ ### Key Concepts

1. How to numerically compute the probability of a majority of heads in a sequence of paired coin ﬂips of a biased coin and a fair coin.
2. How to numerically compute the probability of statistical evidence of a biased coin in a sequence of paired coin ﬂips of a biased coin and a fair coin.
3. Bayesian analysis using likelihood ratios as statistical evidence of a biased coin in a sequence of paired coin ﬂips of a biased coin and a fair coin.

__________________________________________________________________________ ### Vocabulary

1. A sequence of independent Bernoulli trials with probability $1∕2$ of success on each trial is metaphorically called a fair coin. One for which the probability is not $1∕2$ is a biased or unfair coin.
2. Suppose a fair coin $1$ is in the left hand and a biased coin $2$ is in the right hand. The coin in the left hand comes up heads ${k}_{L}$ times and the coin in the right hand comes up heads ${k}_{R}$ times. Let $k=\left({k}_{L},{k}_{R}\right)$ and call this scenario ${𝜃}_{L}$. The likelihood of this scenario occurring is
$Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{L}\right)=B\left({k}_{L},n,{p}_{1}\right)B\left({k}_{R},n,{p}_{2}\right).$

3. If fair coin $1$ is in the right hand and biased coin $2$ is in the left hand then call this scenario ${𝜃}_{R}$. The likelihood ratio is
$\frac{Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{L}\right)}{Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{R}\right)}.$

__________________________________________________________________________ ### Mathematical Ideas

#### A Simple Probability Problem

This problem serves as a warmup for the more detailed calculations below.

You have two coins. One is fair with $ℙ\left[H\right]=\frac{1}{2}$. The other coin is biased with $ℙ\left[H\right]=\frac{2}{3}$. First you toss one of the coins once, resulting in heads. Then you toss the other coin three times, resulting in two heads. Which coin is more likely to be the biased coin, the ﬁrst or the second?

##### Joint Probability Solution

Assuming tossing the biased coin once and tossing the fair coin three times the probability of observing the outcome is

$\frac{2}{3}\cdot \left(\genfrac{}{}{0.0pt}{}{3}{2}\right){\left(\frac{1}{2}\right)}^{2}{\left(\frac{1}{2}\right)}^{1}=\frac{2}{8}.$

On the other hand, assuming tossing the fair coin once and tossing the biased coin three times the probability of observing the given outcome is

$\frac{1}{2}\cdot \left(\genfrac{}{}{0.0pt}{}{3}{2}\right){\left(\frac{2}{3}\right)}^{2}{\left(\frac{1}{3}\right)}^{1}=\frac{2}{9}.$

This means it was more likely that is was the biased coin tossed once while tossing the fair coin three times.

##### Using Bayes Theorem

Take a uniform prior, that is, assume that the ﬁrst coin is equally likely to be either of the two coins. Let $K=\left({k}_{1},{k}_{2}\right)$ denote the observation, and $B$ the event that the ﬁrst coin is biased. Then

$ℙ\left[B\right]=ℙ\left[{B}^{C}\right]=\frac{1}{2}.$

The conditional probabilities are

$ℙ\left[K\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}B\right]=\frac{2}{3}\cdot \left(\genfrac{}{}{0.0pt}{}{3}{2}\right){\left(\frac{1}{2}\right)}^{2}{\left(\frac{1}{2}\right)}^{1}=\frac{2}{8}$

and

$ℙ\left[K\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{B}^{C}\right]=\frac{1}{2}\cdot \left(\genfrac{}{}{0.0pt}{}{3}{2}\right){\left(\frac{2}{3}\right)}^{2}{\left(\frac{1}{3}\right)}^{1}=\frac{2}{9}.$

From Bayes Theorem,

$\begin{array}{llll}\hfill ℙ\left[B\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}K\right]& =\frac{ℙ\left[K\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}B\right]\cdot ℙ\left[B\right]}{ℙ\left[K\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}B\right]\cdot ℙ\left[B\right]+ℙ\left[K\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{B}^{C}\right]\cdot ℙ\left[{B}^{C}\right]}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\frac{2∕8}{2∕8+2∕9}=\frac{9}{17}>\frac{1}{2}.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

##### Using Likelihood Ratios

Take even prior odds to be even, that is, assume that the ﬁrst coin is equally likely to be either of the two coins. Let $K=\left({k}_{1},{k}_{2}\right)$ be the observed outcomes. The likelihood ratio is

$\begin{array}{llll}\hfill L\left(K\right)& =\frac{ℙ\left[E\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}B\right]}{ℙ\left[E\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{B}^{C}\right]}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\frac{\frac{2}{3}{\left(\frac{1}{2}\right)}^{3}}{\frac{1}{2}\cdot {\left(\frac{2}{3}\right)}^{2}{\left(\frac{1}{3}\right)}^{1}}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\frac{2∕24}{4∕54}=\frac{9}{8}.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

Then the odds in favor of the ﬁrst coin being biased and the second coin being fair are $9:8$, so the probability is $\frac{9}{8+9}=\frac{9}{17}$.

#### A Larger Problem

The following problem appeared in the FiveThirtyEight.com weekly Riddler puzzle column on September 29, 2017:

On the table in front of you are two coins. They look and feel identical, but you know one of them has been doctored. The fair coin comes up heads half the time while the doctored coin comes up heads $60$ percent of the time. How many ﬂips you must ﬂip both coins at once, one with each hand would you need to give yourself a $95$ percent chance of correctly identifying the doctored coin?

Extra credit: What if, instead of $60$ percent, the doctored coin came up heads some percent $p$ of the time? How does that aﬀect the speed with which you can correctly detect it?

This problem appeared in a paper “What’s Past is Not Prologue” by James White, Jeﬀ Rosenbluth, and Victor Haghani. In turn, a problem posed in a paper “Good and bad properties of the Kelly criterion” by MacLean, Thorp and Ziemba inspired the problem by White, Rosenbluth, and Haghani.

Solving this problem requires some interpretation and computation.

#### Probability of a Majority of Heads

In any ﬁxed number of simultaneous ﬂips, there is always a chance that the fair coin will have more heads than the biased coin. But the Weak Law of Large Numbers says that the probability that the biased coin will have a majority of heads increases to $1$ as the number of ﬂips increases. One way to determine which is the biased coin is to choose the coin which has a majority of heads. The authors White, Rosenbluth, and Haghani want to calculate how many paired ﬂips of $2$ coins, one biased and one fair, we must observe in order to be $95%$ conﬁdent that the coin with more heads is the biased coin.

Assuming independence, since each coin has a binomial distribution the joint probability mass distribution after $n$ ﬂips is the product of the individual binomial probability mass distributions. Denote by $Q\left(n;j,k\right)$ the probability of $j$ heads for the fair coin and $k$ heads for the biased in $n$ ﬂips. Use $p$ for the probability of heads of the biased coin and $\frac{1}{2}$ for the probability of heads for the fair coin. Then

$Q\left(n;j,k\right)=\left(\genfrac{}{}{0.0pt}{}{n}{j}\right){\left(\frac{1}{2}\right)}^{j}{\left(\frac{1}{2}\right)}^{n-j}\cdot \left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p}^{k}{\left(1-p\right)}^{n-k}.$

Then summing over values where $j gives the probability that the biased coin has more heads than the fair coin in $n$ ﬂips

$ℙ\left[j

This sum has no closed formula evaluation so calculation is necessary. First create two vectors of length $n+1$ with the binomial distribution on $0$ to $n$ with probability $\frac{1}{2}$ for the fair coin and with probability $p$ for the biased coin. Then using the outer product of these two vectors create the bivariate binomial distribution for the two coins. This will be an $\left(n+1\right)×\left(n+1\right)$ matrix. The element in row $i$ and column $j$ is the product of the binomial probability of $i-1$ heads for the fair coin and the binomial probability of $j-1$ heads for the biased coin. The sum is over column indices strictly greater than the row indices. To create the sum, set the lower triangular part of the matrix and the diagonal of the matrix to $0$ and use the sum command to sum all entries of the matrix. This seems to be eﬃcient, even for values of $n$ up to $200$. To ﬁnd the minimal value of $n$ for which this probability is greater than $0.95$ use binary search in $n$ over a reasonable interval.

It might seem possible to use a bivariate normal approximation of the bivariate binomial distribution to calculate the probability. Although approximation of the bivariate binomial distribution with a bivariate normal distribution is possible, the double integration of the bivariate normal with a non-zero covariance would be over a region of the form $y>x-𝜖$. The R language has no direct way to calculate the integral over a region of this form, so it is actually easier to calculate with the bivariate binomial distribution.

The result is that it takes $143$ ﬂips of the two coins for the probability to be greater than $0.95$ for the $0.6$ biased coin to have more heads than the fair coin.

The same analysis for various probabilities of heads $p$ for the biased coin and for values $0.95$, $0.9$ and $0.75$ of certainty is in Figure 1. The number of ﬂips required decreases as the bias $p$ increases, as expected. The number of ﬂips required also decreases as the certainty decreases.  Figure 1: The number of ﬂips $N$ required to get a majority of heads with certainty $0.95$ (black), $0.9$ (red) and $0.75$ (blue).

#### Using the Central Limit Theorem

Assume that the biased coin is in the left hand. Let ${L}_{j}$ be the result of left-hand coin ﬂip $j$, knowing it is biased so ${L}_{j}=\text{Head}=1$ with probability $0.6$, ${L}_{j}=\text{Tail}=0$ with probability $0.4$. Let ${R}_{j}$ be the result of right-hand coin ﬂip $j$, knowing it is fair so ${R}_{j}=\text{Head}=1$ with probability $0.5$, ${R}_{j}=\text{Tail}=0$ with probability $0.5$. Let ${X}_{j}={L}_{j}-{R}_{j}$. This is a trinomial random variable with ${X}_{j}=-1$ with probability $1∕5=0.2$, ${X}_{j}=0$ with probability $1∕2=0.5$, ${X}_{j}=1$ with probability $3∕10=0.3$.

Consider the statistics of ${X}_{j}$,

$\begin{array}{llll}\hfill 𝔼\left[{X}_{j}\right]& =\left(-1\right)\cdot \left(0.2\right)+0\cdot \left(0.5\right)+\left(+1\right)\cdot \left(0.3\right)=0.1,\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill Var\left[{X}_{j}\right]& ={\left(-1\right)}^{2}\cdot \left(0.2\right)+{\left(0\right)}^{2}\cdot \left(0.5\right)+{\left(+1\right)}^{2}\cdot \left(0.3\right)-{\left(0.1\right)}^{2}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =0.49,\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \sigma \left[{X}_{j}\right]& =0.7.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}& \hfill \end{array}$

Let ${S}_{n}=\frac{1}{n}\sum _{j=1}^{n}{X}_{j}$ be the sample mean. The distribution of ${S}_{n}$ is on $-1$ to $1$ by increments of $1∕n$ for a total of $2n+1$ points with

Thus the distribution of ${S}_{n}$ clusters around $1∕10$ with standard deviation $7∕\left(10\sqrt{n}\right)$. If the biased coin is in the right hand, the distribution of ${S}_{n}$ clusters around $-1∕10$ with standard deviation $7∕\left(10\sqrt{n}\right)$.The goal is ﬁnd a number of ﬂips such that with high probability the sample mean ${S}_{n}$ is closer to the theoretical mean $0.1$ (when the biased coin is in the left hand) than the value $-0.1$ (the theoretical mean coming from the situation with the biased coin in the right hand). Take “the sample mean closer to $0.1$ than $-0.1$” to mean that ${S}_{n}$ is closer to $0.1$ than the distance to the midpoint $0$ between the two means. Precisely, the goal is to ﬁnd $n$ so that $ℙ\left[{S}_{n}-0.1>-0.1\right]\ge 0.95$. This is the same event as $ℙ\left[{S}_{n}>0\right]=ℙ\left[{L}_{n}-{R}_{n}>0\right]=ℙ\left[{L}_{n}>{R}_{n}\right]$ which is the same criterion as having the majority of heads above.

Expressing the precise distribution of ${S}_{n}$ analytically is diﬃcult. So instead, use a numerical calculation of the distribution. To numerically calculate the distribution of the sample mean ${S}_{n}$, use the R package distr and speciﬁcally the function convpow that takes the $n$-fold convolution power of a distribution to create the distribution of the $n$-fold sum of a random variable. Note that mathematically the support of the distribution of, for instance ${S}_{100}$, would be from $-1$ to $1$, with $201$ points. However, the actual calculated distribution support of, for instance, ${S}_{100}$, is $111$ points from $-46$ to $64$. The reason is that convpow ignores points with probability less than $1{0}^{-16}$ and so these points are not included in the domain. So use match(0, (support(D100))) to ﬁnd the index of $0$. This turns out to be index $47$. So summing the distribution over indices from $47+1$ to $111$ gives the probability that the random variable ${S}_{n}>0$. Searching for a value of $n$ large enough that the probability exceeds $0.95$ gives the required number of ﬂips. From some experimentation, the numerical support of the distribution is positive for $n\ge 150$, that is, $ℙ\left[{S}_{n}\ge 0\right]=1$ for $n\ge 150$. That means that the numerical search should start from a high of $150$.

Using this algorithm, the necessary number of ﬂips to distinguish a biased coin with a probability of heads $0.6$ from a fair coin with a certainty level of $0.95$ is $143$. The same analysis for various probabilities of heads $p$ for the biased coin and for values $0.95$, $0.9$ and $0.75$ of certainty is in Figure 2. The number of ﬂips required decreases as the bias increases as expected. The number of ﬂips required also decreases to $5$ for certainty $0.95$, $4$ for certainty $0.9$ and $3$ for certainty $0.75$. Figure 2: The number of ﬂips $N$ required to statistically distinguish a coin with probability $p$ from a fair coin with $0.95$ (black), $0.9$ (red) and $0.75$ (blue).

#### Connections between the Two Calculations Figure 3: The support of the bivariate binomial and the calculation of the multinomial distribution for $n=10$.

The calculation of the probabilities uses fundamentally the same information, as the diagram in Figure 3 for the case $n=10$ illustrates. The $11×11$ array of dots represents the support of the bivariate binomial distribution as a matrix, with rows from $0$ to $11$ for the number of heads from the fair coin and columns $0$ to $11$ for number of heads for the biased coin.

The probability that the majority of ﬂips is from the biased coin is the sum of the probabilities in the strict upper triangle of the support of the bivariate distribution. The shaded portion of Figure 3 shows this domain.

The multinomial probability distribution on $-10$ to $10$ of the sum $\sum _{j=0}^{10}\left({L}_{j}-{R}_{j}\right)$ is the sum of the probabilities along diagonals of the bivariate binomial distribution as indicated by the red arrows. In the case the biased coin has $p=0.6$ so the mean of ${L}_{n}-{R}_{n}=0.1$, the probability of the event

$\begin{array}{llll}\hfill \left[{S}_{n}>0\right]& =\left[\frac{1}{n}\sum _{j=1}^{n}\left({L}_{j}-{R}_{j}\right)>0\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\left[\sum _{j=1}^{n}\left({L}_{j}-{R}_{j}\right)>0\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\left[\sum _{j=1}^{n}{L}_{j}>\sum _{j=1}^{n}{R}_{j}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

is the sum of the probabilities in the strict upper triangle. This is the same criterion as for the majority of heads above. For $p=0.6$ the search using the sample mean again requires $143$ pair ﬂips to distinguish the coins with certainty $0.95$.

The framing of the application motivates the choice of criterion. The original problem posed in FiveThirtyEight.com comes from a white paper by James White, Jeﬀ Rosenbluth, and Victor Haghani from Elm Partners Investing. The paper “Good and bad properties of the Kelly criterion” by McLean, Thorp and Ziemba motivated their example. The question is a simpliﬁed and idealized version of an investment question about how to identify a higher performing investment, modeled by a biased coin, from a lower performing investment with just an even chance at making a proﬁt, modeled by a fair coin. In that case, the $95%$ certainty of a majority of gains is a reasonable choice of criterion. If the question is merely to identify the biased coin, then it makes sense to use the Central Limit Theorem to distinguish the mean given that the biased coin is in the left hand from the mean given that the biased coin is in the right hand.

#### Bayesian Solution

The coin ﬂips are independent. So if a coin comes up heads with probability $p$ and we ﬂip it $n$ times, the binomial distribution $B\left(k,n,p\right)=\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p}^{k}{\left(1-p\right)}^{n-k}$ gives the probability that it comes up heads exactly $k$ times. Number the coins $1$ and $2$ probabilities of coming up heads ${p}_{1}$ and ${p}_{2}$ respectively. Flip each coin $n$ times and record how many times each coin come up heads, but we don’t know which coin is which. Say the coins come up heads ${k}_{L}$ and ${k}_{R}$ times for the left and right coin respectively. Call the vector of observed data $k=\left({k}_{L},{k}_{R}\right)$.

Exactly two possible scenarios occur:

Fair on Left
Fair coin $1$ was in the left hand and biased coin $2$ was in the right hand and the observed outcome is $k=\left({k}_{L},{k}_{R}\right)$. Call this scenario ${𝜃}_{L}$. The likelihood of this scenario occurring is
$Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{L}\right)=B\left({k}_{L},n,{p}_{1}\right)B\left({k}_{R},n,{p}_{2}\right).$

Fair on Right
Fair coin $1$ was in the right hand and biased coin $2$ was in the left hand and the observed outcome is $k=\left({k}_{L},{k}_{R}\right)$. Call this scenario ${𝜃}_{R}$. The likelihood of this scenario occurring is
$Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{R}\right)=B\left({k}_{L},n,{p}_{2}\right)B\left({k}_{R},n,{p}_{1}\right).$

Compute the posterior probability by using Bayes rule:

$Q\left({𝜃}_{L}|k\right)=\frac{Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{L}\right)\cdot ℙ\left[{𝜃}_{L}\right]}{Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{L}\right)\cdot ℙ\left[{𝜃}_{L}\right]+Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{R}\right)\cdot ℙ\left[{𝜃}_{R}\right]}$

and similarly for $Q\left({𝜃}_{R}|k\right)$. Assume that scenarios ${𝜃}_{L}$ and ${𝜃}_{R}$ have the same prior probability so $ℙ\left[{𝜃}_{L}\right]=ℙ\left[{𝜃}_{R}\right]=1∕2$. Therefore, the Bayesian condition for conﬁdence simpliﬁes to

To be conﬁdent with level $\alpha$ about one of the scenarios then the posterior probability must be greater than $1-\alpha$.

Fair on Left
If $Q\left({𝜃}_{L}\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}k\right)>1-\alpha$, then we are conﬁdent in Scenario Fair on Left.
Fair on Right
If $Q\left({𝜃}_{R}\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}k\right)>1-\alpha$, then we are conﬁdent in Scenario Fair on Right.

By deﬁnition, $Q\left({𝜃}_{L}\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}k\right)+Q\left({𝜃}_{R}\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}k\right)=1$ for all $k$. So the two cases above can’t simultaneously be true when $\alpha <1∕2$. Rearranging these inequalities, we have

The implication is that we can stop ﬂipping coins once the diﬀerence between log-likelihoods grows suﬃciently large. The smaller is $\alpha$, the larger the diﬀerence in log-likelihoods must be before we can declare that we are conﬁdent. Simplify the expression by substituting the deﬁnition for the likelihoods $Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{L}\right)$ and $Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{R}\right)$ in terms of the binomial probabilities becoming

$\frac{Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{L}\right)}{Q\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{R}\right)}={\left(\frac{{p}_{1}}{{p}_{2}}\right)}^{{k}_{L}-{k}_{R}}\cdot {\left(\frac{1-{p}_{2}}{1-{p}_{1}}\right)}^{{k}_{L}-{k}_{R}}.$

Take logarithms of both sides and combine the expressions into a single inequality involving the absolute diﬀerence of log-likelihoods

$\left|logQ\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{L}\right)-logQ\left(k\phantom{\rule{0.3em}{0ex}}|\phantom{\rule{0.3em}{0ex}}{𝜃}_{R}\right)\right|>log\left(\frac{1}{\alpha }-1\right).$

The implication is that we can stop ﬂipping coins once the diﬀerence between log-likelihoods grows suﬃciently large. The smaller is $\alpha$, the larger the diﬀerence in log-likelihoods must be before we can declare that we are conﬁdent. After simpliﬁcation, this becomes

$\left|{k}_{L}-{k}_{R}\right|\cdot \left|log\left(\frac{1}{{p}_{1}}-1\right)-log\left(\frac{1}{{p}_{2}}-1\right)\right|>log\left(\frac{1}{\alpha }-1\right).$

The number of heads ${k}_{L}$ and ${k}_{R}$ in the left and right hands are random variables. In one hand the number of heads from the coin with ${p}_{1}$ will have the binomial probability distribution with mean $n{p}_{1}$ and standard deviation $\sqrt{n{p}_{1}\left(1-{p}_{1}\right)}$. In the other hand the number of heads from the coin with ${p}_{1}$ will have the binomial probability distribution with mean $n{p}_{2}$ and standard deviation $\sqrt{n{p}_{2}\left(1-{p}_{2}\right)}$. Normal distributions with corresponding means and standard deviations can approximate each binomial distribution.

As a ﬁrst approximation, substitute the empirical probabilities ${\stackrel{̂}{p}}_{1}={k}_{L}∕n$ and ${\stackrel{̂}{p}}_{2}={k}_{R}∕n$ obtained from the distribution means. Rearrange to isolate $n$ to obtain

$n>\frac{1}{\left|{\stackrel{̂}{p}}_{1}-{\stackrel{̂}{p}}_{2}\right|}\cdot \frac{log\left(\frac{1}{\alpha }-1\right)}{\left|log\left(\frac{1}{{p}_{1}}-1\right)-log\left(\frac{1}{{p}_{2}}-1\right)\right|}.$

This expression is a lower bound on the number of samples required, in terms of the conﬁdence $\alpha$, the known probabilities ${p}_{1}$ and ${p}_{2}$ and the empirical probabilities ${\stackrel{̂}{p}}_{1}$ and ${\stackrel{̂}{p}}_{2}$. Note that if ${p}_{1}-{p}_{2}\to 0$, then $n\to \infty$ which is expected. Taking ${\stackrel{̂}{p}}_{1}={p}_{1}=0.5$, ${\stackrel{̂}{p}}_{2}={p}_{2}=0.6$ and $\alpha =0.05$, then $n=72.619$. More generally, Figure 4 shows a plot of the number of ﬂips required as a function of ${p}_{2}$ for ﬁxed ${p}_{1}=0.5$ and ﬁxed $\alpha =0.05$, so in percentages $95%$ conﬁdence. Figure 4: Number of ﬂips required as a function of ${p}_{2}$ for ﬁxed ${p}_{1}=0.5$ and $\alpha =0.05$.

However, this is only a lower bound on the real number of ﬂips required to ﬁnd the biased coin since ${k}_{L}$ and ${k}_{R}$ can be closer than what was used above. In fact, in the scenario Fair on Left can have ${k}_{L}$ in the range $\left[0,n{p}_{1}+{z}_{\beta }\sqrt{n{p}_{1}\left(1-{p}_{1}\right)}\right]$ and ${k}_{R}$ in the range $\left[n{p}_{2}-{z}_{\beta }\sqrt{n{p}_{2}\left(1-{p}_{2}\right)},n\right]$. The probability of each of these events is determined by quantiles ${z}_{\beta }$ of the binomial distribution, or approximately by the normal distribution. Since $\sqrt{n{p}_{2}\left(1-{p}_{2}\right)}<\sqrt{n\cdot \frac{1}{2}\cdot \frac{1}{2}}=\frac{1}{2}\sqrt{n}$

$\begin{array}{llll}\hfill \left|{k}_{L}-{k}_{R}\right|& >\left(n{p}_{2}-{z}_{\beta }\sqrt{n{p}_{2}\left(1-{p}_{2}\right)}\right)-\left(n{p}_{1}+{z}_{\beta }\sqrt{n{p}_{1}\left(1-{p}_{1}\right)}\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =n\left({p}_{2}-{p}_{1}\right)-{z}_{\beta }\left(\sqrt{n{p}_{2}\left(1-{p}_{2}\right)}+\sqrt{n{p}_{1}\left(1-{p}_{1}\right)}\right)\sqrt{n}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & >n\left({p}_{2}-{p}_{1}\right)-{z}_{\beta }\sqrt{n}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

with probability determined by the quantile ${z}_{\beta }$. (Note that since ${p}_{2}-{p}_{1}>0$ this is positive for suﬃciently large $n$.) The number of ﬂips then required to determine which is the biased coin is then determined by

$n\left({p}_{2}-{p}_{1}\right)-{z}_{\beta }\sqrt{n}>\frac{1}{\left|{\stackrel{̂}{p}}_{1}-{\stackrel{̂}{p}}_{2}\right|}\cdot \frac{log\left(\frac{1}{\alpha }-1\right)}{\left|log\left(\frac{1}{{p}_{1}}-1\right)-log\left(\frac{1}{{p}_{2}}-1\right)\right|}.$

Choosing a quantile ${z}_{\beta }$, substituting for ${p}_{1}$, ${p}_{2}$ and $\alpha$, and then solving for $n$ gives a lower bound for the number of ﬂips required for distinguishing the fair coin from the biased coin.

For example, choosing ${z}_{beta}=1.96$ gives the approximate probability $\beta =0.975$ that ${k}_{L}$ in the range $\left[0,n{p}_{1}+{z}_{\beta }\sqrt{n{p}_{1}\left(1-{p}_{1}\right)}\right]$ and independently that ${k}_{R}$ in the range $\left[n{p}_{2}-{z}_{\beta }\sqrt{n{p}_{2}\left(1-{p}_{2}\right)},n\right]$. Then the probability of both events is approximately ${\left(0.975\right)}^{2}=0.95$. Using ${p}_{1}=0.5$, ${p}_{2}=0.6$ and $\alpha -0.05$, the inequality is

$0.1n-1.96\sqrt{n}>7.62.$

Solving for $n$ gives $n=525.51$. As another example set ${z}_{\beta }=0.559$ with a probability of $0.712$ of ${k}_{L}$ in the range $\left[0,n{p}_{1}+{z}_{\beta }\sqrt{n{p}_{1}\left(1-{p}_{1}\right)}\right]$ and independently that ${k}_{R}$ in the range $\left[n{p}_{2}-{z}_{\beta }\sqrt{n{p}_{2}\left(1-{p}_{2}\right)},n\right]$ and joint probability of $0.507$. Then solving $0.1n-0.559\sqrt{n}>7.62$ gives $n=143$. As a ﬁnal example, choosing ${z}_{\beta }=0$ gives a probability of $0.5$ of ${k}_{L}$ in the range $\left[0,n{p}_{1}\right]$ and independently with probability approximately $0.57$ that ${k}_{R}$ in the range $\left[n{p}_{2},n\right]$ and joint probability of $0.254$. This is equivalent to the original calculation giving the lower bound of $n=72.6$.

The extended Bayesian analysis introduces another parameter ${z}_{\beta }$. Using two parameters ${z}_{{\beta }_{1}}$ and ${z}_{{\beta }_{2}}$, as long as $n{p}_{1}+{z}_{{\beta }_{1}}\sqrt{n{p}_{1}\left(1-{p}_{1}\right)} for reasonable values of $n$, might be better. Now some experimentation, including binary search, is necessary to choose a value of $n\ge 73$ required to determine the biased coin with certainty $0.95$. The scripts below show the results of that experimentation and compare it to the other determination methods based on the majority and the Central Limit Theorem. For example, with ${p}_{1}=0.5$ in the left hand, and ${p}_{2}=0.6$ in the right hand it takes $189$ ﬂips to identify the fair coin in the left hand with $95%$ certainty. A close look at the results show why the log-likelihood Bayesian criterion requires more ﬂips for the same certainty. The short answer is that the log-likelihood is a stricter requirement than the majority requirement in the other two approaches. In one trial with $200$ ﬂips, the fair coin in the left hand came with $108$ heads while the biased coin in the right hand came up with $110$ heads. The majority of heads was in the right hand, but the absolute diﬀerence of the log-likelihoods is $0.8109302$, not greater than $log\left(1∕\alpha -1\right)=2.944439$. Both heads totals are well within the two-standard deviation range around the respective expected values of $100$ heads and $120$ heads respectively.

An R script testing all three methods against placing the fair and biased coins in the left and right randomly is in the scripts section. In summary, even with more ﬂips, the Likelihood method is slightly less eﬀective than the two majority methods. It is also a more complicated procedure.

#### Sources

The deﬁnition of biased coin is from Wikipedia.

The simple probability problem is adapted from Bogolmony, Cut-the-Knot, Two Coins, One Fair, One Biased.. The same problem appears in FlyingColoursMaths Two Coins, One Fair, One Biased..

The problem of how many tosses to ﬁnd the biased coin appeared in the FiveThirtyEight.com weekly Riddler puzzle column on September 29, 2017. This ﬁrst subsection with the majority solution is adapted from a white paper “What’s Past is Not Prologue” by James White, Jeﬀ Rosenbluth, and Victor Haghani from Elm Partners Investing. In turn, their example is motivated by a paper “Good and bad properties of the Kelly criterion” by McLean, Thorp and Ziemba. The second subsection with the statistical solution is original.

The Bayesian analysis is adapted and extended from
Lessard,Finding the baised coin..

_______________________________________________________________________________________________ ### Algorithms, Scripts, Simulations

#### Scripts

R
1prob <- function(n, p) {
2    fair <- dbinom(0:n, n, 0.5)         #fair binomial distribution
3    biased <- dbinom(0:n, n, p)         #biased binomial dist
4    bivarBinom <- fair %o% biased       #outer product, (n+1) x (n+1) matrix
5    bivarBinom[ lower.tri(bivarBinom, diag=TRUE) ] <- 0
6    # retain only the upper triangular values, without diagonal
7    sum( bivarBinom )                   #add them all up
8}
9
10reqN <- c()                             #initialize vector to hold results
11
12for (certainty in c(0.95, 0.9, 0.75)) { #three representative values
13    high <- 1000                        #arbitrary guess for max for search
14    for (p in seq(0.55, 0.99, length=45)) { #array of bias values
15        low <- 1                            #minimum to start search
16
17        while (high - low > 1) {
18            try <- floor( (low + high)/2 ) #binary search
19            if ( prob(try, p) <= certainty ) low <- try else high <- try
20            ## uses monotone increasing nature of probability
21        }
22
23        reqN <- c(reqN, low)            #add search result to vector of results
24    }
25}
26
27N <- matrix(reqN, 45,3)                 #reformat to three columns, one for each certainty
28
29ps = seq(0.55, 0.99, length=45)         #array of bias values to plot on horiz
30plot( ps, N[,1], pch=20, xlab="p", ylab="N") #plot required N for certainty 0.95
31points( ps, N[,2], pch=20, col="red")        #add plot for required N for 0.90
32points( ps, N[,3], pch=20, col="blue")       #add plot for required N for 0.75
4cAp1x1-1800033:
1library(distr)
2
3prob <- function(nn, p) {
4    D1 <- DiscreteDistribution( -1:1, c((1-p)/2, 1/2, p/2) )
5    ## trinomial distribution of X_n = L_n - R_n
6    Dnn <- convpow(D1, nn)
7    ## distribution of sum_{j=1}^n X_n
8    end0 <- NROW( (support(Dnn)) )
9    ## last non-zero support point
10    start0 <- match( 0, (support(Dnn))) + 1
11    ##  index of support point of 0, then add 1
12    distDnn <- d(Dnn)(support(Dnn))
13    ## extract vector of distribution values on [start0, end0]
14    sum( distDnn[start0:end0] )         #add up the probability
15}
16
17reqN <- c()                             #initialize vector to hold results
18
19for (certainty in c(0.95, 0.9, 0.75)) { #three representative value
20    high <- 150                          #place to start search
21    for (p in seq(0.55, 0.99, length=45)) { #array of bias values
22        low <- 1                            #minimum to start search
23        # high <- high from previous        #a maximum to start with
24        ## For greater efficiency, note that the required number of flips
25        ## for the next p  will be not more than the number of flips for previous
26        ## p, so restrict search to interval n=1 to to n=reqN from previous search
27        while (high - low > 1) {
28            try <- round( (low + high)/2 ) #binary search
29            if ( prob(try, p) <= certainty ) low <- try  else high <- try
30        }
31
32        reqN <- c(reqN, high)            #add search result to vector of results
33    }
34}
35
36N <- matrix(reqN, 45,3)                 #reformat to three columns, one for each certainty
37
38ps = seq(0.55, 0.99, length=45)         #array of bias values to plot on horiz
39plot( ps, N[,1], pch=20, xlab="p", ylab="N") #plot for required N for certainty 0.95
40points( ps, N[,2], pch=20, col="red")        #add plot for required N for 0.90
41points( ps, N[,3], pch=20, col="blue")       #add plot for required N for 0.75
4cAp2x1-1800042:
1alpha <- 0.05
2alphaRatio <- (1-alpha)/alpha
3logAlphaRatio <- log((1-alpha)/alpha)
4
5p1 <-  0.5
6p2 <- 0.6
7
8n <- 200
9k <- 100
10
11# uniform fair in Left hand
12coinFlipsLeft <- array( (runif(n*k) <= p1), dim=c(k,n))
13coinFlipsRight <- array( (runif(n*k) <= p2), dim=c(k,n))
14
17# see FEATURES for why cumsum, not just column sum
18
19logQkleft <-  dbinom(headsTotalLeft, n, p1, log=TRUE) +
21logQkright <-  dbinom(headsTotalRight, n, p1, log=TRUE) +
23
24logLikeRatio <- abs(logQkleft - logQkright)
25logGuessBiased <- logLikeRatio > logAlphaRatio
26
27logcorrect <- sum(logGuessBiased == (p1 == 0.5))
28logperCentCorrect <- 100 * logcorrect/k
29logperCentCorrect
4cAp3x1-1800030:
1n <- 200   # number of flips
2k <- 500   # number of experiments
3alpha <- 0.05
4
5NMaj <- 143  # required for Majority
6NCLT <- 143  # required for CLT
7NLL <- 189   # required for logLikelihood
8
9# choices <- rep(FALSE, k)
10choices <- (runif(k) <= 0.5)
11pLeft <- ifelse(choices, 0.6, 0.5)
12pRight <-  ifelse(!choices, 0.6, 0.5)
13
14ULeft <- array( runif(k*n), dim=c(k,n) ) # flips along rows
15coinFlipsLeft <- ULeft <= pLeft  # take advantage of recycling
16URight <- array( runif(k*n), dim=c(k,n) ) # flips along rows
17coinFlipsRight <- URight <= pRight  # take advantage of recycling
18
21
23CLT <- LmR/(1:n)
24
25guessBiasedMajority <- LmR[NMaj,] > 0
26guessBiasedCLT <- CLT[NCLT, ] - 0.1 > -0.1
27
28logQL <- dbinom(headsTotalLeft[NLL, ], NLL, 0.6, log = TRUE) +
29         dbinom(headsTotalRight[NLL, ], NLL, 0.5, log = TRUE)
30logQR <- dbinom(headsTotalRight[NLL, ], NLL, 0.6, log = TRUE) +
31         dbinom(headsTotalLeft[NLL, ], NLL, 0.5, log = TRUE)
32rhs <- log(1/alpha - 1)
33guessBiasedLL <- ( (logQL - logQR) > rhs )
34
35correctMajority <- sum(guessBiasedMajority == (pLeft == 0.6))
36perCentCorrectMajority <- 100 * correctMajority/k
37perCentCorrectMajority
38
39correctCLT <- sum(guessBiasedCLT == (pLeft == 0.6))
40perCentCorrectCLT <- 100 * correctCLT/k
41perCentCorrectCLT
42
43correctLogLike <- sum(guessBiasedLL == (pLeft == 0.6))
44perCentCorrectLL <- 100 * correctLogLike/k
45perCentCorrectLL
4cAp4x1-1800046:

__________________________________________________________________________ ### Problems to Work for Understanding

1. You have two coins. One is fair with $ℙ\left[H\right]=\frac{1}{2}$. The other coin is biased with $ℙ\left[H\right]=p>\frac{1}{2}$. First you toss one of the coins once, resulting in heads. Then you toss the other coin three times, resulting in two heads. What is the least value of $p$ that will allow you to decide which is the biased coin based on this information?
2. Show that the log-likelihood expression with the binomial probabilities simpliﬁes to
$\left|{k}_{L}-{k}_{R}\right|\cdot \left|log\left(\frac{1}{{p}_{1}}-1\right)-log\left(\frac{1}{{p}_{2}}-1\right)\right|>log\left(\frac{1}{\alpha }-1\right).$

__________________________________________________________________________ ### References

   Leonard C. MacLean, Edward O. Thorp, and William T. Ziemba. Long-term capital growth: the good and bad properties of the Kelly and fractional Kelly capital growth criteria. Quantitative Finance, 10(7):681–687, 2010.

   James White, Jeﬀ Rosenbluth, and Victor Haghani. What’s past is Not prologue. https://ssrn.com/abstract=3034686, April 2017.

__________________________________________________________________________ __________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable eﬀort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reﬂects the thoughts, interests and opinions of its author. They do not explicitly represent oﬃcial positions or policies of my employer.

Information on this website is subject to change without notice.