How to Be A Wise Consumer of Psychological Research
Show Me the Data! Looking at Evidence
Says Who? Random Sampling
When behavioral scientists want to assess the attitudes or preferences of very large groups of people (e.g., American voters, Asian-American college students, human beings), they face a seemingly insurmountable problem. It is usually impossible to ask every member of a very large group what he or she thinks, feels, or does. However, behavioral scientists have solved this tricky problem by developing a technique called random sampling. When survey researchers use random sampling, they select a very small proportion of the people from within a very large sample (e.g., 1,000 out of 50 million registered voters). They then estimate what the entire population is like on the basis of the responses of those sampled. The key to getting an accurate estimate is the use of random sampling. Random sampling refers to selecting people from a population so that everyone in the entire population (e.g., all registered voters in the U.S.) has an equal chance of being selected. This turns out to be an incredibly powerful technique. If every person in a group of 50 million voters really does have an equal chance of being selected into a national survey, then the results of the survey based on 1,000 people will almost always prove to resemble the results for the total population.
An excellent example of the importance of random sampling can be found in the 1936 U.S. Presidential election. Prior to that election, the Literary Digest sent postcards to more than 10 million Americans, asking them to report who they planned to vote for in the upcoming election. Among the 2 million Americans who returned the postcards, Alf Landon was the overwhelming favorite. In contrast, a much smaller survey conducted by the recently-formed Gallup group yielded very different results. Based on the responses of only a few thousand likely voters, the Gallup poll suggested that Franklin D. Roosevelt would be the winner. If you pull a dime out of your pocket, and look to see who's face is there, you'll see that the Gallup pollsters were correct. FDR won in a landslide, and Alf Landon faded into obscurity. How did the Gallup poll, based on many fewer people outperform the enormous Literary Digest poll? The Gallup pollsters came very close to performing a true random sample of likely voters. In contrast, the Literary Digest sampled people by taking names from automobile registrations and telephone listings. In 1936, people who owned cars and phones were usually pretty wealthy - and wealthy people overwhelming preferred Alf Landon.
The lesson of the Literary Digest error is that whenever you hear the results of any survey, you should ask yourself how the surveyed people were sampled. Were those sampled really like the pool of people (e.g., American voters, African American children) whose attitudes and behavior the researcher would like to describe?
Even when a researcher makes careful use of random sampling, it is also useful to pay attention to a different form of sampling bias, known as non-response bias. If only a small percentage of randomly sampled people agree to respond to a survey, it is quite likely that those who did respond will be different than those who refused. Modern pollsters have long mastered the science of random sampling. These days, most of the error in most scientific polls is based on the fact that it can be hard to get very high response rates (or hard to know who to sample in the first place). For example, if you randomly sampled all those eligible to vote in a state gubernatorial race, and you only got a 30% response rate, you would have to worry about whether those who refused to be surveyed would vote the same way as the eager 30% who agreed. Moreover, even if everyone agreed to be surveyed, you'd have to worry about whether the sub sample of all eligible voters who actually showed up at the polls on election day had the same preferences as those who either didn't bother to vote or were unable to do so.
It is also important to note that random sampling helps you describe only the population of people from whom you sampled (and not other populations). For example, if researchers randomly sampled registered voters, but only did so in North Carolina, they might get a great idea of what North Carolinians believe, but it would be very risky to generalize these results to other Americans. This is why people sometimes criticize the results of surveys taken of college students, who differ markedly from older adults. On the other hand, if surveyors wanted to know the opinions of college students, it would make little sense to sample anyone else. The key issue might be exactly which college students. A random sample of 1,000 American college students would tell us much more than a random sample of 1,000 students at Vassar College. Of course, if we cared only about Vassar College students, we would want to sample Vassarians at random. The key issue in sampling is to pay careful attention to who was sampled and to make certain that those sampled are the same kind of people about whom a researcher has made a claim (a claim about what the evidence shows).
How to Ask Why: Experimental Manipulations and Random Assignment
When a researcher moves from descriptive research to experimental research, random sampling is still important, but it begins to take a back seat to a second major technique. This second technique is random assignment, and it is the cornerstone of the experimental method. Unlike random sampling, which is a technique for deciding who to study, random assignment can take place only after people have already been selected into a study. Random assignment is a technique for assigning people to different specific conditions in an experiment, and random assignment occurs only when everyone in the study has an equal chance of serving in any specific condition. In the same way that random sampling guarantees that the people sampled in a study will be as similar as possible to those who were not sampled, random assignment guarantees that those assigned to one experimental condition will be as similar as possible to those assigned to a different condition. This is crucial because the whole idea of an experiment is to identify two identical groups of people and then to manipulate something. One group gets an experimental treatment, and one does not. If the group that gets the treatment (e.g., a drug, exposure to a violent videogame) behaves differently than the control group that did not get the treatment, we can attribute the difference to the treatment - but only if we can rest assured that the two groups were similar prior to the treatment.
Another way to put this is that if we wish to identify the causes of human behavior, we must usually perform experiments in which we manipulate one thing, or a few factors, at a time. We can only do this by making use of random assignment. Suppose a researcher at Cornell University developed a new technique for teaching foreign language. If the researcher could do so, he might persuade all of his colleagues in the Spanish department to start using this new technique. After a year of instruction using the new technique, suppose that the professor documented that the average student who completed one year of Spanish at Cornell performed well above the national average in a test of Spanish fluency (relative to students at other universities who had also completed a year of Spanish). Can we attribute this performance advantage to the new instruction technique? Given how difficult it is to get admitted to Cornell in the first place, it is likely that students at Cornell would have performed well above the national norm even if they had been taught using a new technique. If the researcher really wanted to know if his teaching technique was superior, he would have needed to randomly assign some Cornell students to receive the new form of instruction while randomly assigning others to receive a traditional form of instruction (this would be hard to do, but that is a detail).
Consider a more important question. Do seatbelts save lives? One way to find out would be to obtain records of thousands of serious automobile accidents. To simplify things, suppose a researcher focused exclusively on drivers (rather than passengers) and found an accurate way to determine whether drivers were wearing their seatbelts at the time of each crash. The researcher then obtained accurate records of whether the driver in each crash survived. Imagine that drivers wearing seatbelts were much more likely to have survived. Can we safely assume that seatbelts are the reason? Not on the basis of this study alone. The problem is that, for ethical reasons, the people in this hypothetical study were not randomly assigned to different seatbelt conditions. As it turns out, those who do and do not routinely wear seatbelts differ in many important ways. Compared with habitual non-users of seatbelts, habitual users are older, more educated, and less likely to speed or drink and drive. These additional factors are also likely to influence survival in a serious accident, and they are all confounded with seatbelt use. On the basis of this study and this study alone, we cannot tell whether it is seatbelts or other safe driving practices that are responsible for the greater survival rates among seatbelt users.
If we were to conduct a large-scale experiment on seatbelt use (by determining habitual seatbelt use on the basis of coin flips), we could completely eliminate all of these confounds in one simple step. Random assignment would create two identical groups of people, exactly half of whom were forced to use seatbelts at all times, and exactly half of whom were forbidden from doing so during the experimental period. Of course, this hypothetical experiment would be unethical. Thus, researchers interested in seatbelt use have had to do a lot of other things to document the important role that seatbelts play in saving people's lives (including laboratory crash tests and studies that used sophistical statistical techniques to separate the effects of seatbelt use from other effects). The point is not that seatbelts don't save lives. They clearly do. The point is that it has taken a lot of time and effort to document this fact because of the impossibility of conducting an experiment on this topic. If you want to conduct a single study to figure out what causes something, you will almost always need to conduct an experiment in which you make use of random assignment. As a consumer of psychological research, you must thus ask yourself whether a research claim was based on the results of a careful experiment, or whether a researcher may have compared two groups of people who differed in more than one way at the beginning of the study.
Some Final Thoughts
There are many other ways in which research can go astray. Did Dr. Snittle word his survey questions fairly? Were participants reporting their attitudes honestly? Did those carrying out the research bias answers by subtly communicating to participants what they hoped to find? Was the size of the sample large enough to draw meaningful comparisons? For example, if you read that 4 out of 5 doctors use Brand X, were only five doctors surveyed? Were those who conducted the research strongly motivated to produce a specific result? For example, if those studying the effects of a drug were paid by a pharmaceutical company to do the research, could this conflict of interest distort the way they collect or interpret their data? The list continues. Specific issues such as these aside, however, the two concerns that should come to mind first when evaluating any research claim have to do with proper sampling and proper experimental control. First, were those studied truly representative of the people about whom we would like to draw conclusions? Second, did the researchers isolate the variables they studied by disentangling them from other confounded variables? It is not always easy to get answers to these questions, but if you get in the habit of asking them you will gradually become a better shopper for psychological truths.