M203J Worksheet
[1] The teenage pregnancy rate in 2006 was 72 teenage girls in every 1000 (http://www.guttmacher.org/pubs/USTPtrends.pdf). Home pregnancy kits (for women who collect and test their own samples) was found to have an overall sensitivity of 75% and a specificity of around 65% (http://www.medicine.ox.ac.uk/bandolier/band64/b64-7.html).
(a) If a pregnant teen uses one of these home pregnancy kits, what's the chance that it will correctly say she is pregnant?
This is the sensitivity; i.e., 75%.
(b) If a pregnant teen uses one of these home pregnancy kits, what's the chance of a false negative (i.e., that it will incorrectly say she is not pregnant)?
This is 100% - 75% = 25%.
(c) If a teen who is not pregnant uses one of these home pregnancy kits, what's the chance that it will correctly say she is not pregnant?
This is the specificity; i.e., 65%.
(d) If a teen who is not pregnant uses one of these home pregnancy kits, what's the chance of a false positive (i.e., that it will incorrectly say she is pregnant)?
This is 100% - 65% = 35%.
(e) If a randomly chosen teen uses one of these home pregnancy kits and gets a negative result, what's the chance it is a false negative (i.e., what's the chance she is pregnant in spite of the test saying she's not)?
Consider a random sample of 10,000 teen girls:
Negative Test Positive Test
Not pregnant 6032 = 65% of 9280 3248 = 9280-6032 9280=10000-720
Pregnant 180 = 720-540 540 = 75% of 720 720=(72/1000)10000
Total=10000
There are 180 false negatives out of 6032 + 180 = 6212 negatives all together, so the chance is 180/6212 = 2.9%. I.e., there's only a 3% chance the test is wrong.
[2] If you flip a fair coin 100 times and find the fraction p^ of heads, what's the chance that p^ ≥ 0.6 (i.e., what is the chance that you get at least 60 heads)? Use what we've learned about normal distributions, by assuming that p^ is normally distributed (in this case p = 0.5 since we're assuming the coin is fair). [This is the same problem as choosing a sample of size n = 100 from a normally distributed population with p = 0.50, and asking what percentage of samples have p^ ≥ 60%.]
The formula says σ = √(p(1-p)/n) = √(0.5(1-0.5)/100) = 5%, and x = 60% gives z = 2 (i.e., 60% is two standard deviations above the mean p = 50%). We know 95% of samples have p^ within 2σ of p = 50%, so 100% - 95% = 5% will have p^ outside this range, with half of them at or above 2σ; i.e., there's a 2.5% chance that p^ will be 60% or more.
[3] Monday's Omaha World-Herald, October 25, 2010, published a poll based on sampling 607 registered voters in Omaha (the polling was done October 17-21). It found that 44% planned to vote for Lee Terry for Congress, 39% planned to vote for Terry's opponent, Tom White, and 17% were undecided or planned not to vote. The claimed margin of error was ± 4%.
(a) What margin of error do you find from the formula assuming p = 0.44? Answer: ± 4%
(b) What margin of error do you find from the formula assuming p = 0.39? Answer: ± 3.96%
(c) What e would you need in order to have a 99.7% chance that p is in the range p^ ± e% ? Answer: e = 3s = 6%
(d) Note that with a margin of error of 4%, it's possible that White is ahead of Terry. If we want to try to tell who's really ahead we might want a smaller margin of error. What sample size would you need to have a margin of error of ±2% for Terry's p^ = 44% ?
s is half of the margin of error, so .01 = s = √(p^(1-p^)/n). Solving for n gives n = p^(1-p^)/s^2 = 0.44*0.56/0.01^2 = 2464. Note however, that it really doesn't help much to reduce the error of margin here; there are so many undecided voters that, even if you knew how everyone else stood, you still wouldn't be very sure of the outcome of the election.