 Comparing Distributions

In this unit we have looked at graphs for one variable and with the exception of comparing touchdown passes by NFL teams we have focused on single graphs. In a lot of cases graphs are used to do comparison of data sets. Just like the example of NFL touchdowns one can see that visually comparing two or more graphs would let us draw some simple conclusions. Now that you have an understanding of basic summary statistics along with being able to measure center and spread you should now be able to interpret graphs. Remember that the graph doesn't tell the whole story just the same as the summary statistics lack a visual representation.

Some common graphs used for comparison include side-by-side boxplots, back-to-back stem and leaf graphs, and side-by-side bar graphs. This list is not exhaustive but it does encompass the most common comparison graphs.

Here is the comparison of the NFL TD passes data listed in the Displaying Distribution lesson.

Side-by-side boxplots

This graph was generated on Minitab, a powerful statistics graphing and analysis program. The modified boxplots were drawn vertically. Earlier boxplots were horizontal in orientation. Does it matter which direction? Of course not! You can draw the same conclusion regardless of the orientation of the boxplots.

Looking at the graphs it shows that in general there were more TD passes per team in 1998 than in 2000.  The IQR was smaller in 2000 than 1998 which led to 37 being an outlier. In 1998 37 would not have been an outlier. Just because one data set has higher values than the other does not give an indication of outliers. Remember to graph the data and then do the calculations to verify outliers.

Here is a self-check question comparing male and female weights.


The weights of the male and female students in a class are summarized in the following boxplots:
Which of the following is NOT correct?

  1. About 50% of the male students have weights between 150 and 185 lbs.
  2. About 25% of female students have weights more than 130 lbs.
  3. The median weight of male students is about 162 lbs.
  4. The mean weight of female students is about 120 because of symmetry.
  5. The male students have less variability than the female students.

Which answer did you select? The correct answer is "e". The first four choices are all correct.

Back-to-back stem and leaf plots. Below is a back-to-back stem and leaf plot of the NFL TD passes data. With the data presented in this manner it is easy to make comparisons. Is this back-to-back stem and leaf plot better than the modified boxplots above?  You can make the same comparisons but you could "see" the five number summary on the boxplot but the stem and leaf plot lets you view all of the data.

                                                     1998 TD passes       2000 TD passes

11 |4|
  |3| 7
332 |3| 233
8865 |2| 889
44331110 |2| 001112223
987776665 |1| 56888899
321 |1| 22444
7 |0| 69

Back-to-back bar graphs can be found from time to time. They are used for when each data group contains two different sets of frequency data. A double bar graph is used to compare both between and within data groups. (Note: You can have cases where there are more than two bars per data group. We will limit our discussion here to double bar graphs. However, you can apply the same information to bar graphs with multiple bars per data group.)

Below is an example of a table that has two frequency data values per data group.

Quarterly Reports of Income and Expenses for
Dry Cleaning Associates and Co.
Quarter Income
(thousands of dollars)
(thousands of dollars)
 First Quarter



 Second Quarter



 Third Quarter



 Fourth Quarter



Double bar graphs have many of the same attributes and advantages of regular bar graphs, however there are some major difference:

Similarities between double bar graphs and single bar graphs:

  • They are relatively easy to construct, they display facts about countable data.
  • They show comparisons between different data groups.

The major difference between double bar graphs and single bar graphs:

  • Double bar graphs allow us to make quick generalizations about differences within data groups as well as between data groups.
  • Each data group is represented by two bars.

Now let's look at the bar graph that illustrates the data presented in the table above. Note that there are two bars for each data group, and they are placed next to each other on the graph.


Quarterly Reports of Income and Expenses for
Dry Cleaning Associates and Co.

When reading double bar graphs, we often want to look at the difference in heights between the two data bars within one group. For example, the difference between the income and expenses bars for each individual quarter tells you the profit for that quarter (profit = income expenses). By paying attention, not only to individual bar heights, but differences between bar heights, you can make more generalizations from double bar graphs. Let's use the bar graph above to answer some questions

  1. Is it true that if the Dry Cleaning Associates' income is over 60 thousand dollars, then their expenses are also over 60 thousand dollars? Explain your answer.
    Yes, this is true. The income is over 60 thousand dollars in the 2nd, 3rd, and 4th quarters. For each of those quarters we can see that the expenses are also over 60 thousand dollars, approximately 80 thousand dollars during each quarter.
  2. In what quarters did the company experience its biggest profit and the biggest loss? Approximate both values.
    To answer this question we must compare the value of the income to expenses for each pair of bars, by quarter. Since the bigger the difference is between income and expenses, the larger the profit, we must observe in which quarter this difference is greatest. Visually we can determine that the greatest profit occurs in the 4th quarter. Now we must approximate this profit. Since the income is about 115 thousand and the expenses are about 80 thousand, a good estimate of the profit is 115 80 = 35 thousand dollars.
    The only quarter in which the expenses were bigger than the income is the third quarter. Income expenses = profit, and income is about 70 thousand, while expenses are about 75 thousand. Therefore, 70 75 = 5. We can see that there is a negative profit level, which represents a loss of 5 thousand dollars in the third quarter.
  3. Overall, did the Dry Cleaning Associates experience a profit or a loss for the year?
    We can see that in three of the four quarters, income was greater than expenses. Overall, there was a profit, because the amount lost in the third quarter is not nearly as much as the amount gained in the other three quarters. The graph does not tell us precisely how much is gained or lost, but it does gives us a good estimate, as well as a strong sense of any trends that have occurred.
