Unit 1 AP Statistics

AP Statistics

Sections: 1.| Density Curves 2.| Normal Distributions 3.| Normal Distribution Calculations 4| Assessing Normality

Normal Distributions

The normal curve is called a family of distributions. Each member of the family is determined by setting the parameters (μ and σ ) of the model to a particular value (number). Because the μ parameter can take on any value, positive or negative, and the σ parameter can take on any positive value, the family of normal curves is quite large, consisting of an infinite number of members. This makes the normal curve a general-purpose model, able to describe a large number of naturally occurring phenomena, from test scores to the size of the stars.

Similarity of Members of the Family of Normal Curves

All the members of the family of normal curves, although different, have a number of properties in common. These properties include: shape, symmetry, tails approaching but never touching the x-axis, and area of 1 under the curve.

All members of the family of normal curves share the same bell shape, given the x-axis is scaled properly. Most of the area under the curve falls in the middle. The tails of the distribution (ends) approach the x-axis but never touch, with very little of the area under them.

All members of the family of normal curves have bilateral symmetry. That is, if any normal curve was drawn on a two-dimensional surface (a piece of paper), cut out, and folded through the third dimension, the two sides would be exactly alike. Human beings are approximately bilaterally symmetrical, with a right and left side.

All members of the family of normal curves have tails that approach, but never touch, the x-axis. The implication of this property is that no matter how far one travels along the number line, in either the positive or negative direction, there will still be some area under any normal curve. Thus, in order to draw the entire normal curve one must have an infinitely long line. Because most of the area under any normal curve falls within a limited range of the number line, only that part of the line segment is drawn for a particular normal curve.

All members of the family of normal curves have a total area of one (1.00) under the curve, as do all probability models or models of frequency distributions. This property, in addition to the property of symmetry, implies that the area in each half of the distribution is .50 or one half.

All normal distributions have the same overall shape. The exact density curve for a particular normal distribution is described by giving its mean μ and its standard deviation σ. The mean is located at the center of the symmetric curve, and is the same as the median. Changing μ without changing σ moves the normal curve along the horizontal axis without changing its spread. The standard deviation controls the spread of a normal curve.

Area under a curve

Because area under a curve may seem like a strange concept to many introductory statistics students, a short intermission is proposed at this point to introduce the concept.

Area is a familiar concept. For example, the area of a square is s², or side squared; the area of a rectangle is length times height; the area of a right triangle is one-half base times height; and the area of a circle is π * r². It is valuable to know these formulas if one is purchasing such things as carpeting, shingles, etc.

Areas may be added or subtracted from one another to find some resultant area. For example, suppose one had an L-shaped room and wished to purchase new carpet. One could find the area by taking the total area of the larger rectangle and subtracting the area of the rectangle that was not needed, or one could divide the area into two rectangles, find the area of each, and add the areas together. Both procedures are illustrated below:

Finding the area under a curve poses a slightly different problem. In some cases there are formulas which directly give the area between any two points; finding these formulas are what integral calculus is all about. In other cases the areas must be approximated.

Suppose a curve was divided into equally spaced intervals on the x-axis and a rectangle drawn corresponding to the height of the curve at any of the intervals. The rectangles may be drawn either smaller that the curve, or larger, as in the two illustrations below:

In either case, if the areas of all the rectangles under the curve were added together, the sum of the areas would be an approximation of the total area under the curve. In the case of the smaller rectangles, the area would be too small; in the case of the latter, they would be too big. Taking the average would give a better approximation, but mathematical methods provide a better way.

A better approximation may be achieved by making the intervals on the x-axis smaller. Such an approximations is illustrated below, more closely approximating the actual area under the curve.

The actual area of the curve may be calculated by making the intervals infinitely small (no distance between the intervals) and then computing the area. If this last statement seems a bit bewildering, you share the bewilderment with millions of introductory calculus students. At this point the introductory statistics student must say "I believe" and trust the mathematician or enroll in an introductory calculus course.

When dealing with normal curves there is no area assigned to a single point. A single point would have a width of zero, hence any area associated with a single point would be 0.

Looking at the normal curve

The normal curve above shows relative positions of the standard deviation. Notice how the curve "turns" at the points -1 and 1. The points at which this change of curvature takes place are called inflection points and are located at distance σ on either side of the mean μ.

Remember that μ and σ alone do not specify the shape of most distributions, and that the shape of density curves in general does not reveal σ. These are special properties of normal distributions. The σ controls the shape of the normal curve. If σ is large the normal curve will extend far down the x-axis and the graph will be "flat". A small σ will increase the height of the graph as well as narrowing the spread.

Why are the normal distributions important in statistics? Here are three reasons. First, normal distributions are good descriptions for some distributions of real data. Distributions that are often close to normal include scores on tests taken by many people (such as SAT exams and many psychological tests), repeated careful measurements of the same quantity, and characteristics of biological populations (such as lengths of cockroaches and yields of corn). Second, normal distributions are good approximations to the results of many kinds of chance outcomes, such as tossing a coin many times. Third, and most important, we will see that many statistical inference procedures based on normal distributions work well for other roughly symmetric distributions. However, even though many sets of data follow a normal distribution, many do not. Most income distributions, for example, are skewed to the right and so are not normal. Nonnormal data, like nonnormal people, not only are common but are sometimes more interesting than their normal counterparts.

The 68–95–99.7 rule

Although there are many normal curves, they all have common properties. In particular, all normal distributions obey the following rules.

Approximately 68% of the observations fall within 1 standard deviation of the mean

Note that the range "within one standard deviation of the mean" is highlighted in green. The area under the curve over this range is the relative frequency of observations in the range. That is, 0.68 = 68% of the observations fall within one standard deviation of the mean, or, 68% of the observations are between (μ - σ) and (μ + σ).

Below the axis, in red, is another set of numbers. These numbers are simply measures of standard deviations from the mean.

Approximately 95% of the observations fall within 2 standard deviations of the mean

Approximately 99.7% of the observations fall within 3 standard deviations of the mean

Another way of looking at it!

Some authors refer to it as the “empirical rule.” By remembering these three numbers, you can think about normal distributions without constantly making detailed calculations, and when rough approximations will suffice.

Example

The distribution of heights of American women aged 18 to 24 is approximately normally distributed with mean 65.5 inches and standard deviation 2.5 inches. From the above rule, it follows that

68% of these American women have heights between 65.5 - 2.5 and 65.5 + 2.5 inches, or between 63 and 68 inches,

95% of these American women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between 60.5 and 70.5 inches,

99.7% of these American women have heights between 65.5 - 3(2.5) and 65.5 + 3(2.5) inches, or between 58 and 73 inches.

Therefore, the tallest 2.5% of these women are taller than 70.5 inches. (The extreme 5% fall more than two standard deviations, or 5 inches from the mean. And since all normal distributions are symmetric about their mean, half of these women are the tall side.) Almost all young American women are between 58 and 73 inches in height as shown by the 99.7% calculations.

Because we will mention normal distributions often, a short notation is helpful. We abbreviate the normal distribution with mean μ and standard deviation σ as N(μ,σ). For example, the distribution of young women’s heights is N(65.5, 2.5).

Connection: If 99.7% of the data is "contained" within 3 standard deviations of the mean, plus and minus, then the standard deviation could be estimated by the range. If we know the range of a distribution that is approximately normal then it would be safe to estimate the standard deviation by (range/6). Also you can use the standard deviation to estimate the range, 6σ would be a good estimate of the range.

Try Self-Check 9

National test scores are frequently reported in terms of percentiles, rather than raw scores. If your score on the math portion of such a test was reported as the 90th percentile, then 90% of the students who took the math test scored lower than or equal to your score. Percentiles are used when we are most interested in seeing where an individual observation stands relative to the other individuals in the distribution. Typically, in practice, the number of observations is quite large so that it makes sense to talk about the distribution as a density curve. The median score would be the 50th percentile because half the scores are to the left of (i.e., lower than) the median. The first quartile is the 25th percentile and the third quartile is the 75th percentile.

Review the content and take Multiple Choice 3

© 2004 Aventa Learning. All rights reserved.