Feature

Ben A. Williams, PhD, came by his distrust of randomized controlled trials (RCTs) the hard way: He developed a kind of brain cancer with no proven treatment.

There had been randomized trials of various approaches, but they were all failures, says Williams, an emeritus psychology professor at the University of California at San Diego. And although several drugs had helped a small percentage of patients in Phase II trials, he says, it can be hard to get hold of therapies not yet vetted by Phase III trials.

“Medicine was basically saying if it isn’t done this way, it doesn’t count,” says Williams, describing the difficulties his physicians had in gaining access to therapies that probably wouldn’t help him, but might. “The problem is the one-size-fits-all mentality.”

Like Williams, many other psychologists — as well as medical researchers — question the assumption by the National Institutes of Health, the Food and Drug Administration and others that RCTs should be the gold standard for clinical research. While the methodology — which involves randomly assigning participants to either a treatment or control group — does have its strengths, they say, it also has serious limitations that are often overlooked or ignored.

Because trial participants typically don’t represent the population as a whole, for example, results from RCTs may not apply more generally. And even if they did, it’s impossible to tell from an RCT which subset of participants actually benefited from the intervention being studied.

These critics don’t want to reject RCTs altogether. Rather, they want to supplement their findings with evidence from other methodologies, such as epidemiological studies, single-case experiments, the use of historical controls or just plain clinical experience.

Strengths and weaknesses

No one denies that RCTs have their strengths.

“Randomized trials do two things that are very rare among other designs,” says William R. Shadish, PhD, a professor of psychological science at the University of California at Merced. “They yield an estimate of the effect that is unbiased and consistent.” Although Shadish is reluctant to describe RTCs as the gold standard because the phrase connotes perfection, he does describe himself as a “huge fan” of the methodology.

“If you can do a randomized trial,” he says, “by all means do it.”

But that’s not always possible. By their very nature, he says, some questions don’t permit random assignment of participants. Doing so might be unethical, for example.

Even when RCTs are feasible, they may not provide the answers researchers are looking for.

“All RCTs do is show that what you’re dealing with is not snake oil,” says Williams. “They don’t tell you the critical information you need, which is which patients are going to benefit from the treatment.”

To account for heterogeneity among participants, he explains, RCTs must be quite large to achieve statistical significance. What researchers end up with, he says, is the “central tendencies” of a very large number of people — a measure that’s “not going to be representative of much of anybody if you look at them as individuals.”

Move beyond the context of an RCT itself, and the applicability of the results to individual patients becomes even more problematic.

For one thing, participants in RCTs tend to be a “pretty rarefied population” that isn’t representative of the real-world population an intervention would eventually target, says Steven J. Breckler, PhD, executive director of APA’s Science Directorate.

“Think about the people who show up for drug trials — patients who have probably tried everything else and are desperate for some kind of treatment,” he says, adding that they are further winnowed down as researchers eliminate would-be participants with co-morbid conditions and the like. “Are the results of that trial going to generalize to you and me? Or do we come from a population of people who would never have enrolled in a trial to begin with?”

Experiments, says Breckler, typically involve a trade-off between internal validity — the ability to trace causal inferences to the intervention — and external validity — the generalizability of the results.

“What people seem to fail to recognize is that the perfect RCT is designed strictly with internal validity in mind,” he says.

RCTs may be especially ill-suited to psychological interventions versus medical ones, adds Breckler. In contrast to medications that have a straightforward biochemical effect that’s unlikely to vary across individuals, he says, psychological interventions tend to interact with such factors as gender, age and educational level.

Supplementing RCTs

No one suggests that researchers give up RCTs. Instead, they urge the supplementation of RCTs with other forms of evidence.

“Evidence-based practice should rely on a very broad, diverse base of evidence,” says Breckler. “RCTs would be one source, but there are lots of other sources.” These sources could include Phase II trial data, epidemiological data, qualitative data and reports from the field from clinicians using an intervention, say Breckler and others.

Williams champions the use of historical controls as a supplemental source of information.

In this methodology, researchers examine the results of earlier, nonrandomized trials to establish a crude baseline. They then compare the results of subsequent nonrandomized trials to that benchmark.

The approach works, says Williams, adding that the process allows many interventions to be tested in quick succession. Faced with the failures of RCTs for glioblastoma treatment, for example, researchers turned to the historical record and found that only 15 percent of those with the cancer had no disease progression six months after treatment began.

“They found that if you add this thing to the standard treatment, you can push that number up to 25 percent and add two things and push it up to 35 percent,” he says. “It’s a crude comparison, no doubt, but it turns out to be an effective way of doing the research.”

The FDA agreed, approving a drug for treatment of glioblastoma not on the basis of an RCT but on multiple Phase II trials whose results were better than the historical norm.

Single-case experiments are another important source of evidence, says Alan E. Kazdin, PhD, a past president of APA and professor of psychology and child psychiatry at Yale. In contrast to RCTs, which involve many subjects and few observations, single-case designs involve many observations but often few subjects. Instead of simply doing a pre- and postassessment, the researcher assesses behavior — of an individual, a classroom, even an entire school — over time.

Say a patient has a tic, says Kazdin. In a single-case design, the researcher would observe the patient and establish the number of tics per hour. The researcher would then conduct an intervention and watch what happens over time.

“If you just do an assessment before some treatment and an assessment after treatment and compare the group that got it to the group that did not, you lose the richness of the change on a day-to-day, week-to-week, month-to-month basis,” says Kazdin, emphasizing that single-case designs are not mere case studies.

For Kazdin, overreliance on RCTs means missing out on all sorts of valuable information. Think of the nation’s telescope program, he says. The Hubble telescope looks at visible light. Another telescope looks at X-rays. Another handles gamma rays.

“The method that you use to study something can influence the results you get,” says Kazdin. “Because of that, you always want to use as many different methods as you can.” *


Rebecca A. Clay is a writer in Washington, D.C.

Further reading

  • Kazdin, A.E. (2010). Single-Case Research Designs: Methods for Clinical and Applied Settings, 2nd edition. New York: Oxford University Press.

  • Shadish, W.R., Clark, M.H., & Steiner, P.M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. Journal of the American Statistical Association, 103, 484, 1334–1356.

  • Shadish, W.R., Cook, T.D., & Campbell, D.T. (2001). Experimental and Quasi-Experimental Designs for Generalized Causal Inference, 2nd ed. Florence, KY: Wadsworth.

  • Williams, B.A. (2010). Perils of evidence-based medicine. Perspectives on Biology and Medicine, 53, 1, 106–120.