A group of academics from Oxford, Stanford, Virginia and Bristol universities have looked at a range of subfields of neuroscience and concluded that most of the results are statistically worthless.
The researchers found that most structural and volumetric MRI studies are very small and have minimal power to detect differences between compared groups (for example, healthy people versus those with mental health diseases). Their paper also stated that, specifically, a clear excess of "significance bias" (too many results deemed statistically significant) has been demonstrated in studies of brain volume abnormalities, and similar problems appear to exist in fMRI studies of the blood-oxygen-level-dependent response.
The team, researchers at Stanford Medical School, Virginia, Bristol and the Human Genetics dept at Oxford, looked at 246 neuroscience articles published in 2011 and and excluded papers where the test data was unavailable. They found that the papers' median statistical power - the possibility that a study will identify an effect when there is an effect there to be found - was just 21 per cent. What that means in practice is that if you were to run one of the experiments five times, you’d only find the effect once.
A further survey of papers drawn from fMRI brain scanners - and studies using such scanners have long filled the popular media with dramatic claims - found that their statistical power was just 8 per cent.
Low statistical power caused three problems, the authors said. Firstly, there is a low probability of finding true effects; secondly, there is a low probability that a "true" finding is actually true; and thirdly, exaggerating the magnitude of the effect when a positive is discovered.
There were further problems that led them to believe the power is even lower than they suggest. They noted:
[T]he summary effect size estimates that we used to determine the statistical power of individual studies are themselves likely to be inflated owing to bias — our excess of significance test provided clear evidence for this. Therefore, the average statistical power of studies in our analysis may in fact be even lower than the 8–31% range we observed.
Publishing is a highly competitive enterprise, with certain kinds of findings more likely to be published than others. Research that produces novel results, statistically significant results (that is, typically p < 0.05) and seemingly "clean" results is more likely to be published. As a consequence, researchers have strong incentives to engage in research practices that make their findings publishable quickly, even if those practices reduce the likelihood that the findings reflect a true (that is, non-null) effect.
The paper is titled Power failure: Why small sample size undermines the reliability of neuroscience and is published in the May 2013 edition of Nature Reviews' Neuroscience journal. The conclusions have wide implications for the field.
Button et al note that advances in computer processing have made crunching large data sets faster and easier, but the statistical rigour hasn't kept pace. They call for research to be fundamentally redesigned to maintain the credibility of neuroscience.
These dramatic advances in the flexibility of research design and analysis have occurred without accompanying changes to other aspects of research design, particularly power. For example, the average sample size has not changed substantially over time despite the fact that neuroscientists are likely to be pursuing smaller effects.
The increase in research flexibility and the complexity of study designs combined with the stability of sample size and search for increasingly subtle effects has a disquieting consequence: a dramatic increase in the likelihood that statistically significant findings are spurious. This may be at the root of the recent replication failures in the preclinical literature8 and the correspondingly poor translation of these findings into humans.
Kate Button, one of the authors behind the paper, has a nice article at the Guardian explaining the issues.
"The current reliance on small, low-powered studies is wasteful and inefficient, and it undermines the ability of neuroscience to gain genuine insight into brain function and behaviour. It takes longer for studies to converge on the true effect, and litters the research literature with bogus or misleading results," writes Button.
Demand for brain science has increased from policy wonks and other pseuds looking for a "neuroscientific explanation" to settle their turf war; from journalists, eager to fill pages with a brightly coloured pictures and grabby headlines; and from academics chasing after publications.
And it isn't just the brain boffins who are cutting corners and making improbable exaggerations. Neuroscience evangelist journalist Jonah Lehrer resigned from The New Yorker magazine and later parted ways with WiReD last year after he admitted making up quotes that he had attributed to Bob Dylan. ®