|
Sections: 1.| Introduction 2.| Designing Samples 3.| Designing Experiments 4| Simulating Experiments |
Introduction Exploratory data analysis seeks to discover and describe what data say by using graphs and numerical summaries. The conclusions we draw from data analysis apply to the specific data that we examine. Often, however, we want to answer questions about some large group of individuals. To get sound answers, we must produce data in a way that is designed to answer our questions. Suppose our question is “What percent of American adults agree that the United Nations should continue to have its headquarters in the United States?” To answer the question, we interview American adults. We can’t afford to ask all adults, so we put the question to a sample chosen to represent the entire adult population. How shall we choose a sample that truly represents the opinions of the entire population? Statistical designs for choosing samples are the topic of the second lesson in this unit.Our goal in choosing a sample is a picture of the population, disturbed as little as possible by the act of gathering information. Sample surveys are one kind of observational study. In other settings, we gather data from an experiment. In doing an experiment, we don’t just observe individuals or ask them questions. We actively impose some treatment in order to observe the response. Experiments can answer questions such as “Does aspirin reduce the chance of a heart attack?” and “Does a majority of college students prefer Pepsi to Coke when they taste both without knowing which they are drinking?” Experiments, like samples, provide useful data only when properly designed. We will discuss statistical design of experiments in the third lesson. The distinction between experiments and observational studies is one of the most important ideas in statistics.OBSERVATION VERSUS EXPERIMENT observational study observes individuals and measures variables of interest but does not attempt to influence the responses. An experiment, on the other hand, deliberately imposes some treatment on individuals in order to observe their responses. Observational studies are essential sources of data about topics from the opinions of voters to the behavior of animals in the wild. But an observational study, even one based on a statistical sample, is a poor way to gauge the effect of an intervention. To see the response to a change, we must actually impose the change. When our goal is to understand cause and effect, experiments are the only source of fully convincing data. Example: HELPING WELFARE MOTHERS FIND JOBS
When we simply observe welfare mothers, the effect of job-training programs on success in finding work is confounded with (mixed up with) the characteristics of mothers who seek out training on their own. Recall that two variables (explanatory variables or lurking variables) are said to be confounded when their effects on a response variable cannot be distinguished from each other.Example: Teenage driving safety
Observational studies of the effect of one variable on another often fail because the explanatory variable is confounded with lurking variables. We will see that well designed experiments take steps to defeat confounding. Because experiments allow us to pin down the effects of specific variables of interest to us, they are the preferred method of gaining knowledge in science, medicine, and industry. In some situations, it may not be possible to observe individuals directly or to perform an experiment. In other cases, it may be logistically difficult or simply inconvenient to obtain a sample or to impose a treatment. Simulations provide an alternative method for producing data in such circumstances. The fourth lesson in this unit introduces techniques for simulating experiments.Statistical techniques for producing data open the door to formal statistical inference, which answers specific questions with a known degree of confidence. The later units of this course book are devoted to inference. We will see that careful design of data production is the most important prerequisite for trustworthy inference. |