Career Center

Jennifer Degroot Hanawalt, a third-year clinical psychology graduate student at Wayne State University, is investigating how pregnant women's feelings and expectations about parenting correlate with their children's self-regulatory behavior-as measured by the children's ability to resist playing with a tempting but forbidden toy-three years later.

Collecting the data for such a long-term study is time-consuming and expensive, and is an especially daunting task for graduate students, who often have limited time and research dollars. Fortunately for Hanawalt, though, she didn't have to do it all herself.

Instead, she's using data collected by the National Institute of Child Health and Human Development's (NICHD) Study of Early Child Care. The study has followed more than 1,000 families from their infants' births in 1991 through 2004, and the database includes years of information on the families' demographics, parents' characteristics, children's experiences in child care and school, and children's cognitive, linguistic, social and emotional development.

"The database offers data from many time points about parenting and children's temperament that we wouldn't have time to collect in just a couple of years in our lab," Hanawalt says.

She is one of many researchers who are beginning to use large-scale databases in their work-reanalyzing data collected for one study to help answer a multitude of other research questions.

Historically, most psychologists have believed that the only data worth analyzing were data they collected on their own, says psychologist David Johnson, PhD, a director of grants and contracts at the National Hispanic University in San Jose, CA, and the former executive director of the Federation for Behavioral, Psychological and Cognitive Sciences. But in the past several years, some psychologists have begun mining previously collected data for new insights-a trend fueled by new repositories for raw data in developmental psychology, brain imaging, aging research and other fields.

For example, NICHD began allowing outside researchers like Hanawalt to use data from the Study of Early Child Care four years ago. Psychologist Sarah Friedman, PhD, the study's scientific coordinator and a primary investigator, says that the scope and depth of the data sets have been a gift to developmental psychology-allowing researchers to access vast amounts of data about children's development.

But researchers who tap large-scale databases still face technical and cultural challenges.

"It's a shift in the culture of psychology," explains Johnson. In 1998, as executive director of the federation, he was involved in an early-and unsuccessful-effort to create a centralized database for all psychological research. Despite that effort's failure, proponents of data sharing are encouraged by the number of independent databases emerging in different areas within psychology. These include the NICHD study, a cache of raw data from more than 70 functional magnetic resonance imaging (fMRI) studies housed at the fMRI Data Center at Dartmouth College, and others.



Psychologists and graduate students can use large-scale databases in several ways. Researchers can examine and reanalyze raw data to confirm and replicate original research. Instructors can use the data in their classrooms to teach students statistical and analytic techniques on realistic data sets. And, most importantly, psychologists can use the data to conduct novel research.

Andrea Mechelli, PhD, of University College London, recently published a study in the Journal of Cognitive Neuroscience (Vol. 15, No. 7) based on data he culled from the fMRI Data Center. The data were originally collected at the National Institutes of Health by Alumit Ishai, PhD, and her colleagues. They had used fMRI to examine how particular areas of the brain-in the occipital and temporal cortex-respond to pictures of different categories of objects, like faces and chairs. They found that the areas responded differently to the various categories.

Mechelli reanalyzed the same data using a new technique called dynamic causal modeling, which examines how different areas of the brain interact and affect each other.

His results suggest that the category effects found by the first researchers are mediated by the early visual cortex, which processes visual differences between the objects, and not by later cognitive processing in the parietal region of the brain. In other words, the category effects are caused by the pictures' physical differences-like straight lines instead of curvy lines-rather than any differences in the meaning of the objects pictured.

Mechelli says that collecting the data for his study on his own would have wasted both time and money.

"It's a shame when data are acquired and only used once, when they could have been used many times," he says. "Acquiring data is so expensive."



Despite the recent successes of databases like the NICHD Study of Early Child Care and the fMRI Data Center, psychologists are late to the data-sharing game.

"Sociologists and economists have been making their careers for years analyzing other people's data," explains Jacquelyn James, PhD, a psychologist who is the associate director of the Murray Research Center at the Radcliffe Institute for Advanced Study. The center's archives, founded in 1976, include data from longitudinal studies and surveys-particularly of women-conducted by psychologists, sociologists and other social scientists.

Biologists and other physical scientists have also discovered the uses of large-scale data sharing, she adds, particularly with enormous undertakings like the Human Genome Project.

However, data sharing in psychology presents some intrinsic challenges, particularly in studies that use human subjects. Psychologists need to be able to ensure the privacy of participants who may be identifiable, sometimes through audio or videotapes. Further complicating matters is the fact that the informed consent forms that all research participants sign sometimes specify that information collected during the study will not be released to anyone other than the immediate researchers.

Also, psychologists sometimes design and use idiosyncratic measures and constructs in their studies, which makes comparing data across studies more complicated. Some critics have argued that data from different studies cannot be compared, because they were collected at different times and under different circumstances.

But many psychologists say that these issues-while important-are not insurmountable. Jack McArdle, PhD, a professor at the University of Virginia, has been collecting data sets related to intelligence testing for 20 years. He calls research based on the accumulation of multiple sets of secondary raw data "mega-analysis," and has developed new mega-analytic statistical techniques. The most important thing for researchers to remember, he says, is to learn as much as possible about the original study or studies and how they were conducted-the details behind the data. The actual statistical techniques he uses are not particularly complicated, he says: "A graduate student could learn them in one weeklong class."

And he encourages them to do so. "I really believe," he says, "that the aggregation of a lot of studies can tell more about the truth than one single study."

But until recently, psychology's culture has not been as conducive to data sharing as other fields' cultures, according to Johnson. "Part of that culture has been that it's important for people to make a name for themselves, and for that they need to collect their own data," he says. "But I think little by little that's changing."

James, who has worked at the Murray Research Center for 15 years, agrees. "In the beginning," she says, "people thought we were crazy. They all said 'People are going to want to collect their own data.'" Now, she says, about 300 researchers per year come to look at the center's data.

Psychology associations, academic institutions and even the federal government have begun to actively encourage data sharing. APA's Science Directorate sponsors an Advanced Training Institute-the most recent was held in May-to teach researchers how best to use the NICHD Study of Early Child Care data. The Journal of Cognitive Neuroscience now requires all researchers to submit any fMRI data collected for their studies to the fMRI Data Center as a condition of publication. And, as of October 2003, any researcher who applies for NIH funding must include a data-sharing plan in the grant proposal.

Despite the current limitations, proponents of large-scale databases and data sharing say that their time has come.

"I think," says John Van Horn, operations director of the fMRI Data Center, "that [data sharing] is a sign of maturity in the field."