DATA
Statistics is the field of study which concerns itself with the art
and science of data. Statistics involves planning the experiment, collecting,
organizing, and analyzing data. In addition statistics involves interpreting,
summarizing and presenting data.
There are two broad areas of statistics: Parametric and
Non-Parametric.
- Parametric Statistics deals with the analysis of population
parameters given specific assumptions made about the value of the parameter
and the nature of the population distribution from which the sample was drawn.
This course will deal will this broad area of statistics.
- Non-Parametric Statistics deals with the analysis of population
parameters but requires no assumptions concerning the population distribution
or any specific values of any parameters of that distribution. This course
will NOT get involved with non-parametric statistics.
This course is divided into two distinct types of parametric statistics: Descriptive
and Inferential.
- Descriptive Statistics is the
organization of raw data (numbers) into tables and graphs. Also the data is
analyzed to find measures of central tendency (averages), measures of
dispersion (standard deviation) and the identification of extreme data
(outliers).
- Inferential Statistics uses the
analysis of a sample (part of a population) to make inferences about the mean,
median, proportion and standard deviation of a population. The monthly
unemployment figure is an example of inferential statistics. Based on a
sample, part of the population, an inference is made about the proportion of
people unemployed in the working force.
The first part of the course will be spent examining data (Descriptive Statistics). When data is
collected it contains information about some group
of individuals. The information is organized in variables.
Individuals are the objects described by a set of data.
Individuals may be people, but they may be animals or things.
People are called subjects. Animals and things are generally called
units.
A variable is any characteristic of an individual. A variable
can take different values for different individuals.
Example: The data collected by your school when students enroll
is a collection of statistics. The students are the individuals and
variables may include name, age, birth date, gender, GPA, and intended
college major.
The above variables are not all the same type. Some are categorical
and others are quantitative. Gender and intended college major simply
place the individuals into categories. The variables like age and GPA have
numeric values for which we can do arithmetic. It makes sense to give an average
age or an average GPA whereas we can't give an average gender. We can do counts
on categorical variables and then do arithmetic with the counts.
Categorical variable or Qualitative variable:
Qualitative data is discrete. It counts
how many times an attribute exists. This attribute is described in
words. There are usually gaps between values. Fractions are
usually not part of qualitative data. Consider a statistics class
of 18 males and 14 females. Gender is a qualitative variable. The
number of males and females were counted. It is impossible for
there to be 17.2 males and 14.7 females. The graph for qualitative
data will have words on one axis and numbers on the other axis
Quantitative Variable:
Quantitative data can be either discrete or continuous. When
it is discrete, it counts how often some variable
occurs. When it is continuous, it measures how
much of a variable exists. The number of cars on a highway
between 7:00 AM and 8:00 AM is an example of a discrete
quantitative variable, and the number of ounces in a cereal box
is an example of a continuous quantitative variable. The
quantity of cereal in a 20 ounce box is within a range of 20 ounces,
such
as 19.8 to 20.2
ounces. Fractions are part of continuous data. The graph
for quantitative data will have numbers on both axes .
A quantitative variable takes numerical
values for which arithmetic operations such as adding and averaging
make sense. Quantitative variables can be subdivided into
continuous data and discrete data.
-
Continuous data are data that can take on any
value. Since age, weight, length, and volume can take on any value,
they are considered to be continuous data.
-
Discrete data are data that can NOT take on
any value. The score of a basketball game might be 93 to 88 but it
cannot be 93.24 to 88.01. The latter values are not allowed for
basketball scores. Counts are also discrete data.
Examples of quantitative variables include
height, weight, length, volume, and number of M &Ms in a bag.
PRACTICE PROBLEM
FUEL-EFFICIENT CARS:
Here is a small part of a
data set that describes the fuel economy (in miles per gallon) of 1998 model
motor vehicles:
Make and Model |
Vehicle Type |
Transmission Type |
Number of Cylinders |
City MPG |
Highway MPG |
BMW 381I |
Subcompact |
Automatic |
4 |
22 |
31 |
BMW 381I |
Subcompact |
Manual |
4 |
23 |
32 |
Buick Century |
Midsize |
Automatic |
6 |
20 |
32 |
Chevrolet Blazer |
4-Wheel drive |
Automatic |
6 |
16 |
20 |
(a)
What are the individuals in
this data set?
(b)
For each individual, what
variables are given? Which of these variables are categorical and which are
quantitative?
Answers:
(a) Cars (b) Vehicle type and transmission type are
categorical; number of cylinders, city mpg, and hwy mpg are quantitative.
Try
Self-Check 1
A variable
generally takes values that vary. One variable may take values that are very
close together while another variable takes values that are quite spread
out. We say that the pattern of variation of a variable is its distribution.
The distribution of
a variable tells us what values the variable takes and how often
it takes these values.
Statistical
tools and ideas can help you examine data in order to describe their main
features. This examination is called exploratory data analysis. Like
an explorer crossing unknown lands, we first simply describe what we see.
Each example we meet will have some background information to help us, but
our emphasis is on examining the data. Here are two basic strategies that
help us organize our exploration of a set of data:
• Begin
by examining each variable by itself. Then move on to study relationships
among the variables.
• Begin
with a graph or graphs. Then add numerical summaries of specific aspects
of the data.
You are now ready for Statistics Assignment 1: Types of
Data and Statistics.
|