Nature published an interesting news feature on replication of psychological studies, or rather, the frequent failure of such attempts. The article describes the pressure on scientists to produce positive results; the problem of publishing replications in general, more so if they failed; and the ease by which positive results can be created through pure chance, let alone fraud, when sample and effect sizes are small, and experimental conditions are notoriously difficult to control.
The article concentrates on psychology, a discipline that had a lot of blows to take in the last time from high-profile fraud investigations such as the case of Diederik Stapel (see also another Nature feature) or the story on Marc Hauser, but the problem that both scientists and journals have strong incentives to produce or select interesting results is probably universal. A recent meta-analysis of survey data on fraud concluded that
A pooled weighted average of 1.97% (N = 7, 95%CI: 0.86–4.45) of scientists admitted to have fabricated, falsified or modified data or results at least once –a serious form of misconduct by any standard– and up to 33.7% admitted other questionable research practices. In surveys asking about the behaviour of colleagues, admission rates were 14.12% (N = 12, 95% CI: 9.91–19.72) for falsification, and up to 72% for other questionable research practices. Meta-regression showed that self reports surveys, surveys using the words “falsification” or “fabrication”, and mailed surveys yielded lower percentages of misconduct. When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others.
And even there is no fraud involved, selection bias alone might be strong enough to create a significant amount of wrong results, as John Ioannidis argues in his article “Why Most Published Research Findings Are False”:
a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.
In fact, a note from the Bayer research labs confirms what seems to be common knowledge among industrial research departments: a large amount, maybe more than 50%, of all published scientific studies in medicine is not reproducible. I wonder what the percentage in ecology would be – my feeling is that blunt fraud may be less common than in other fields, but ecological experiments certainly suffer commonly from small sample/effect sizes, not to speak about the pervasive practice of multiple testing which I think is very likely to cause a type I error inflation in the field.
The solution seems straightforward: remove all incentives that create publication bias – more easily said than done though, in a system where career and funding is now more than ever based on merits in terms of publication impact, for individual researchers as well as institutions and publishers. Maybe we should ask ourselves whether these incentives, while obviously having been successful in boosting research output over the last years, are appropriate to not only produce more, but also the right results.