Data are the building blocks of psychological science. We collect, analyze and share data to advance our understanding of human behavior. It is the same in every field of empirical science — the data are fundamental. Many fields of science — including psychology — have become prolific in generating data. Indeed, the Feb. 11 issue of the journal Science featured a special section on dealing with data. It seems that the volume and complexity of data produced by science are creating their own special set of challenges.
The National Science Foundation — the principal funding agency for basic science in the United States — recognizes the problem. Beginning this year, all proposals submitted to NSF require a data management plan. The key element of the plan places a special responsibility with the scientist.
Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data … created or gathered in the course of work under NSF grants.
In its effort to look ahead, the National Science Board (the NSF’s policy-setting body) has created a task force on data policies. The board recognizes many challenges associated with data sharing and management, including timely access to data, sustainability of data, cost burdens of data management and the growing desire for openness of data generated with taxpayer dollars.
The National Institutes of Health has its own data-sharing policy, which has been in place since 2003. In the view of NIH, “data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health.” All NIH grant proposals that seek $500,000 or more in direct costs require a data-sharing plan.
Many fields of science nurture a culture of data sharing, and owe their success to it. Incredible advances in astronomy and in genomics are the result of data sharing. And many fields within the social and behavioral sciences — most notably economics, political science and sociology — depend on shared data resources.
Although data sharing is easy to endorse in principle, researchers are not always so willing to contribute. Writing in the Aug. 3, 2010, issue of The Chronicle of Higher Education, Felicia LeClere, a research scientist at the Inter-University Consortium for Political and Social Research at the University of Michigan, observed, “We can all agree that it is a good thing to do and intrinsic to good scientific practice. In reality, however, researchers tend to view data sharing with a mix of fear, contempt, and dread.”
LeClere contends that very few of the arguments against data sharing are legitimate. I’ve heard many of the excuses myself — the fears of misuse, the high costs associated with properly documenting and cleaning the data, and the insistence that the data are “owned” by the scientists who generate them.
Yet, one valid argument against data sharing does apply to psychology. Much of the data we generate comes from human participants, whose identities we often promise to protect. Sharing those data can easily compromise participants’ identity. It takes only a few bits of information (e.g., birth date, ZIP code and gender) to identify individuals with tremendous accuracy.
Even when based on such legitimate concerns, our reluctance to share data impedes progress in psychological science. It is this scientific imperative, coupled with mounting pressure from the funding agencies, to which psychology must respond. If we don’t take ownership of the challenge ourselves, then others will do it for us.
We have before us an opportunity to create new infrastructures and to grow a productive culture of data sharing that advances our science without compromising our values.