Findings show replicability in social-behavior sciences

Given the sample sizes and effect sizes, the observed replicability rate of 86%, based on statistical significance, could not have been any higher, the researchers point out. (Credit: Getty Images)

With best practices, high replicability is achievable in social-behavior sciences research, researchers report.

Roughly two decades ago, a community-wide reckoning emerged concerning the credibility of published literature in the social-behavioral sciences, especially psychology. Several large-scale studies attempted to reproduce previously published findings to no avail or to a much lesser magnitude, sending the credibility of the findings—and future studies in social-behavioral sciences—into question.

A handful of top experts in the field, however, set out to show that when best practices are employed, high replicability is possible. Over six years, researchers discovered and replicated 16 novel findings with ostensibly gold standard best practices, including pre-registration, large sample sizes, and replication fidelity. Their findings appear in Nature Human Behaviour.

“It’s an existence proof that we can set out to discover new findings and replicate them at a very high level,” says professor Jonathan Schooler, director of the University of California, Santa Barbara’s META Lab and the Center for Mindfulness and Human Potential, and senior author of the paper. “The major finding is that when you follow current best practices in conducting and replicating online social-behavioral studies, you can accomplish high and generally stable replication rates.”

Their study’s replication findings were 97% the size of the original findings on average. By comparison, prior replication projects observed replication findings that were roughly 50%.

“There have been a lot of concerns over the past few years about the replicability of many sciences, but psychology was among the first fields to start systematically investigating the issue,” says lead author John Protzko, a research associate to Schooler’s lab, where he was a postdoctoral scholar during the study. He is now an assistant professor of psychological science at Connecticut State University.

“The question was whether past replication failures and declining effect sizes are inherently built into the assorted scientific domains that have observed them. For example, some have speculated that it is an inherent aspect of the scientific enterprise that newly discovered findings can become less replicable or smaller over time.”

The group decided to perform new studies using emerging best practices in open science—and then to replicate them with an innovative design in which the researchers committed to replicating the initial confirmation studies regardless of outcome. Over the course of six years, research teams at each lab developed studies which were then replicated by all of the other labs.

In total, the coalition discovered 16 new phenomena and replicated each of them 4 times involving 120,000 participants. “If you use best practices of large samples, pre-registration, open materials in the discovery of new science, and you run replications with as best fidelity to the original process as you can, you end up with a very highly replicable science,” Protzko says of the findings.

One key innovation the study offered was that all the participating labs agreed to replicate the initial confirmation studies regardless of their outcome. This removed the scientific community’s customary bias of only publishing and replicating positive outcomes, which may have contributed to inflated initial assessments of effect sizes in the past. Furthermore, this approach enabled the researchers to observe several cases for which study designs that failed to produce significant findings in the original confirmation later attained reliable effects when replicated at other labs.

Across the board, the project revealed extremely high replicability rates of their social-behavioral findings, and no statistically significant evidence of decline over repeated replications. Given the sample sizes and effect sizes, the observed replicability rate of 86%, based on statistical significance, could not have been any higher, the researchers point out.

To test the novelty of their discoveries, they ran independent tests on people’s predictions regarding the direction of the new findings and their likelihood of replicability. Several follow-up surveys in which naïve participants evaluated descriptions of both the new studies and those associated with previous replication projects, found no differences in their respective predictability. Thus, the replication success of these studies was not due to them discovering obvious results that would necessarily be expected to replicate. Indeed, many of the newly discovered findings have already been independently published in high quality journals.

“It would not be particularly interesting to discover that it is easy to replicate completely obvious findings,” Schooler says. “But our studies were comparable in their surprise factor to studies that have been difficult to replicate in the past. Untrained judges who were given summaries of the two conditions in each of our studies and a comparable set of two-condition studies from a prior replication effort found it similarly difficult to predict the direction of our findings relative to the earlier ones.”

Because each research lab developed its own studies, they came from a variety of social, behavioral, and psychological fields such as marketing, political psychology, prejudice, and decision-making. They all involved human subjects and adhered to certain constraints, such as not using deception. “We really built into the process that the individual labs would act independently,” Protzko says. “They would go about their sort of normal topics they were interested in and how they would run their studies.”

Collectively, their meta-scientific investigation provides evidence that low replicability and declining effects are not inevitable. Rigor-enhancing practices can lead to very high replication rates, but exactly identifying which practices work best will take further study. This study’s “kitchen sink” approach—using multiple rigor-enhancing practices at once—didn’t isolate any individual practice’s effect.

Additional investigators on the study are from the McGill University; Matt Berent Consulting; the University of Wisconsin-Madison; Stanford University; the University of Virginia; the University of Gothenburg; Georgetown University; Washington University in St. Louis; the University of South Carolina; and Phenoscience Laboratories in Berlin, Germany.

Source: UC Santa Barbara