AP Statistics
Sections:  1.|  Density Curves  2.| Normal Distributions 3.| Normal Distribution Calculations  4.| Assessing Normality

  Assessing Normality

In the latter part of this course we will want to invoke various tests of significance to try to answer questions that are important to us. These tests involve sampling people or objects and inspecting them carefully to gain insights into the populations from which they come. Many of these procedures are based on the assumption that the host population is approximately normally distributed. Consequently, we need to develop methods for assessing normality.

Method 1 Construct a frequency histogram or a stemplot. See if the graph is approximately bell-shaped and symmetric about the mean. A histogram or stemplot can reveal distinctly nonnormal features of a distribution, such as outliers, pronounced skewness, or gaps and clusters. You can improve the effectiveness of these plots for assessing whether a distribution is normal by marking the points μ , ± 1σ, and  ±2σ on the x axis. This gives the scale natural to normal distributions. Then compare the count of observations in each interval with the 68–95–99.7 rule.

Example:

To estimate the amount of lumber in a tract of timber, an owner decided to count the number of trees with diameters exceeding 12 inches in randomly selected 50-by-50 foot squares. Seventy 50-by-50 foot squares were chosen via a simple random sample of all squares in the tract, and the selected trees were counted in each tract. The data are listed here:

7 8 7 10 4 8 6 8 9 10
9 6 4 9 10 9 8 8 7 9
3 9 5 9 9 8 7 5 8 8
10 2 7 4 8 5 10 7 7 7
9 6 8 8 8 7 8 9 6 8
6 11 9 11 7 7 11 7 9 13
10 8 8 5 9 9 8 5 9 8

Calculate the sample mean of the sample,  the mean number of timber trees for all 50-by-50-foot squares in the tract.

Mean = 7.73

Calculate the sample standard deviation for the data. Construct intervals, calculate the percentage of squares falling into each of the three intervals, and compare with the corresponding percentages given by the empirical rule.

Standard deviation = 1.99

7.73 ± 1.99  = (5.74, 9.72)

7.73 ± (2)1.99 = (3.75, 11.71)

7.73 ± (3)1.99 = (1.76, 13.70)

# of observations within 1 SD of the mean observed in data set # of observations within 2 SD of the mean observed in data set # of observations within 3 SD of the mean observed in data set
50 (71.4%) 67 (95.7%) 70 (100%)
# of observations predicted by empirical rule # of observations predicted by empirical rule # of observations predicted by empirical rule
47.6 (68%) 66.5 (95%) 69.8 (99.7%)

The observed and predicted percentages are very similar. Therefore, there is evidence that data are normally distributed. Check the histogram below. A "normal" would fit well with this data set.

Smaller data sets rarely fit the 68–95–99.7 rule. This is true even of observations taken from a larger population that really has a normal distribution. There is more chance variation in small data sets.

Method 2 Construct a normal probability plot. A normal probability plot provides a good assessment of the adequacy of the normal model for a set of data. Most statistics utilities, including Minitab and Data Desk, can construct normal probability plots from entered data. The TI-83/89 will also do normal probability plots. You will need to be able to produce a normal probability plot (either with a calculator or with computer software) and interpret it. Any normal distribution produces a straight line on the plot because standardizing is a transformation that can change the slope and intercept of the line in our plot but cannot change a line into a curved pattern.

Example:

The following 50 numbers were randomly generated and have a mean of 75.7 and standard deviation of 5.068. Use a  normal probability plot to determine if this data could have come from a normal distribution. Explain clearly how the normal probability plot provides evidence for your conclusion.

66 66 67 68 68 68 70 70 70 71
71 71 72 73 73 73 73 73 74 75
75 75 76 76 76 77 77 77 77 77
77 77 78 78 78 78 78 79 79 79
80 81 81 82 82 84 84 84 85 86

Use a computer program to graph the normal probability plot. Below is the output from Minitab. The basic premise is that the plot compares the data with what would be expected of data that is perfectly normally distributed. Then two quantities are compared: The data and idealized normally distributed data. If the two generally agree that means the data agrees with what would be expected from a normal distribution. The normal probability plot is then linear. Otherwise, the plot will not be linear. Of course, no plot will be exactly linear, because data is subject to randomness in it's collection. We look for a general pattern of linearity. This plot shows strong linearity.


Below are 3 data sets and corresponding normal probability plots. 

Data Sampled From a Normal Distribution

Here's a histogram of 100 observations that were randomly sampled from a normal distribution. Next to the histogram you see the normal probability plot of the data (generated used the Normality Test in the Stat menu in Minitab for Windows.)

 

Notice that the normal probability plot (NPP) is basically straight. That's the idea: Normal data = straight NPP. So, when the NPP is straight you have evidence that the data is sampled from a normal distribution.

Data Sampled From a Right Skewed Distribution

For right skewed data, the normal probability plot is generally not straight. In general this sort of curvature in the NPP implies right skew.

Data Sampled From a Left Skewed Distribution

For left skewed data, the normal probability plot is generally not straight. In general this sort of curvature in the NPP implies left skew.

As you progress in the course normal probability plots will become clearer. At this point you should be able to recognize whether the sample data appears to be from a population that is normal based on the normal probability plot.

Try Self Check 13

Review the content in Normal Distribution Calculations and Assessing Normality and proceed to Multiple Choice 5. 

Proceed to the Unit 2 Exam Multiple Choice and Free Response

© 2004 Aventa Learning. All rights reserved.