AP Statistics
Sections: 1.| Binomial Distribution  2.| Math of Binomial Distributions 3.| Normal Approximation 4.| Geometric Distribution 5.| Math of Geometric Distributions

   Normal Approximation

Normal Approximations for Binomial Distributions

For large values of n the distribution of the count X is approximately normal. The mean and variance for the approximately normal distribution of X are np and np(1-p), identical to the mean and variance of the binomial(n,p) distribution. 

Note: Because the normal approximation is not accurate for small values of n, a good rule of thumb is to use the normal approximation only if np>10 and np(1-p)>10.

For example, consider a population of voters in a given state. The true proportion of voters who favor candidate A is equal to 0.40. Given a sample of 200 voters, what is the probability that more than half of the voters support candidate A?

The count X of voters in the sample of 200 who support candidate A is distributed B(200,0.4). The mean of the distribution is equal to 200*0.4 = 80, and the variance is equal to 200*0.4*0.6 = 48. The standard deviation is the square root of the variance, 6.93. The probability that more than half of the voters in the sample support candidate A is equal to the probability that X is greater than 100, which is equal to 1- P(X< 100).

To use the normal approximation to calculate this probability, we should first acknowledge that the normal distribution is continuous and apply the continuity correction. This means that the probability for a single discrete value, such as 100, is extended to the probability of the interval (99.5,100.5). Because we are interested in the probability that X is less than or equal to 100, the normal approximation applies to the upper limit of the interval, 100.5. If we were interested in the probability that X is strictly less than 100, then we would apply the normal approximation to the lower end of the interval, 99.5.

So, applying the continuity correction and standardizing the variable X gives the following:
1 - P(X< 100)
= 1 - P(X< 100.5)
= 1 - P(Z< (100.5 - 80)/6.93)
= 1 - P(Z< 20.5/6.93)
= 1 - P(Z< 2.96) = 1 - (0.9985) = 0.0015. Since the value 100 is nearly three standard deviations away from the mean 80, the probability of observing a count this high is extremely small.

Continuity correction

Because the normal distribution can take all real numbers (is continuous) but the binomial distribution can only take integer values (is discrete), a normal approximation to the binomial should identify the binomial event "8" with the normal interval "(7.5, 8.5)" (and similarly for other integer values). The figure below shows that for P(X > 7) we want the magenta region which starts at 7.5.

Example:

 If n=20 and p=.25, what is the probability that X is greater than or equal to 8?

  • The normal approximation without the continuity correction factor yields
    z=(8-20 × .25)/(20 × .25 × .75)^.5 = 1.55, hence P(X *greater than or equal to* 8) is approximately .0606 (from the table).
  • The continuity correction factor requires us to use 7.5 in order to include 8 since the inequality is weak and we want the region to the right. z = (7.5 - 5)/(20 × .25 × .75)^.5 = 1.29, hence the area under the normal curve (magenta in the figure above) is .0985.
  • The exact binomial solution is .1019.

Hence for small n, the continuity correction factor gives a much better answer.

Try Self Check 3

Proceed to Multiple Choice 2 - Continuity Correction

Binomial vs. Normal Approximation

This table summarizes each method and type of distribution:

  Binomial Normal Approximation
When to use

When n is small, usually less than 25

When n is large, and when doing inference

Important criteria

 

In sampling situations, your population must be at least 10 times larger than your sample. (Note some textbooks say 20)

N > 10n

(This sampling situation is referred to as almost binomial the trials aren’t exactly independent, but p from one trial to the next is close enough to approximate the binomial.)

 

The expected number of successes and failures must be at least 5 (NOTE: some textbooks say 10).

np 5

n(1 . p) 5

NOTE: Many textbooks use the letter q in place of (1 - p), so you’ll see it as

nq 5.

 

How to find probabilities Calculator or binomial formula Calculator or normal curve table

Discrete or continuous distribution?

Discrete

Continuous

Special considerations

 

Because it’s a discrete distribution, you can directly find probabilities for exact numbers of outcomes (such as P(X = 2)).

When finding ranges of outcomes, it makes a difference whether the endpoints are included; that is, whether it’s X ≤ 3 or X < 3.

Because it’s a continuous distribution, if you want to find a probability for an exact number (such as P(X = 2)), you’d need to use the continuity correction and find the range from .5 below the number to .5 above the number.

When finding probabilities for ranges of outcomes, it doesn't make a difference whether the endpoints are included; that is, X ≤ 3 is the same as X < 3. Because you’re modeling a discrete distribution with a continuous one, the continuity correction will make your calculations more precise.

Expected value for number of successes

np

np

Expected value for number of failures

n(1 - p), or nq n(1 - p), or nq
Mean np np

Standard deviation

 

You usually don’t need to know it in this context, but it’s 

 

 

  or

Following is an example using the above table:

A magazine reported that 6% of American drivers read the newspaper while driving. If 300 drivers are selected at random, find the probability that exactly 25 say they read the newspaper while driving.

Solution

Given p = 0.06 n = 300, assumption (1-p) or q = 0.94

Step 1: Check to see if the normal approximation to the binomial can be used.

np = (300)(0.06) = 18  nq = (300)(0.94) = 282   

Since np> 5 and nq > 5 the normal distribution can be used.

Step 2:  Find the mean and standard deviation.

μ = np = (300)(0.06) = 18

σ =   =   =  = 4.11

Step 3: Write the problem in probability notation: P(X = 25)

Step 4: Rewrite the problem by using the continuity correction factor:

P(24.5 < X < 25.5)  Graph this area on the curve N(18, 4.11)

 

Step 5: Find the corresponding z values. Since 25 represents any value between 24.5 and 25.5, find both z values.

z1 =

Proceed to Statistics Assignment 1 Working with the Binomial Distribution

© 2004 Aventa Learning. All rights reserved.