how small differences in data analysis make huge differences in results

Over the past 20 years or so, there has been growing concern that many results published in scientific journals cannot be reproduced.

Depending on the field of study, studies have found that efforts to revise published studies lead to different results between 23% and 89% of the time.

To understand how different researchers might arrive at different results, we asked hundreds of ecologists and evolutionary biologists to answer two questions by analyzing a given data set. They came up with a huge set of answers.

Our study has been accepted by BMC Biology as a registered stage 1 report and is currently available as a preprint prior to stage 2 peer review.

Why is reproducibility a problem?

Reasons for reproducibility problems are common in science. These include an over-reliance on simplistic measures of “statistical significance” rather than nuanced assessments, fact-finding journals preferring to publish “exciting” findings, and questionable research practices that make articles more sensational at the expense of transparency and increase false positive rates in literature.

Much of the research on reproducibility and the ways in which it can be improved (such as “open science” initiatives) has been slow to spread between different fields of science.

Read more: Our study found ‘questionable research practices’ by ecologists and biologists – here’s what it means

Interest in these ideas is growing among ecologists, but so far there has been little research assessing reproducibility in ecology. One reason for this is the difficulty of disentangling environmental differences from the influence of researcher selection.

One way to approach the reproducibility of ecological research, separate from environmental effects, is to focus on what happens after the data are collected.

Birds and siblings, grass and seedlings

We were inspired by work led by Raphael Silberzahn, which asked social scientists to analyze a dataset to determine whether the skin tone of soccer players predicted the number of red cards they received. The study found a wide range of results.

We emulated this approach in open-call ecology and evolutionary biology to help us answer two research questions:

  • “To what extent is the growth of nestling blue tits (Cyanist blue) influenced by competition with siblings?’

  • “How Grass Cover Affects eucalyptus spp. recruiting seedlings?” (“eucalyptus spp. recruitment of seedlingsā€¯ means how many saplings of trees of the genus eucalyptus there is.)

Researchers disagree on whether grass cover encourages or discourages eucalyptus seedlings.

Two hundred forty-six ecologists and evolutionary biologists responded to our call. Some worked alone and others in teams, creating 137 written descriptions of their overall response to the research questions (along with numerical results). These responses varied significantly for both datasets.

Examining the effect of grass cover on the number of eucalyptus seedlings, we had 63 responses. Eighteen described a negative effect (more grass means fewer seedlings), 31 described no effect, six teams described a positive effect (more grass means more seedlings), and eight described a mixed effect (some analyzes found positive effects and some found negative effects).

For the effect of sibling competition on blue tit growth, we had 74 responses. Sixty-four teams described a negative effect (more competition means slower growth, although only 37 of these teams found this negative effect convincing), five described no effect, and five described a mixed effect.

What the results mean

Perhaps not surprisingly, we and our co-authors had different views on how these results should be interpreted.

We asked three of our contributors to comment on what impressed them the most.

Peter Vesk, who was the source of eucalyptus data said:

Looking at the average of all the analyses, it makes sense. Grass essentially has a negligible effect on [the number of] eucalyptus tree seedlings, compared to the distance from the nearest mother tree. But the range of expected effects is striking. It is in my own experience that very small differences in the analysis workflow can contribute to large variations [in results].

Simon Griffith collected the blue tit data more than 20 years ago and it had not been analyzed before because of the complexity of decisions about the right analytical path. He said:

This study shows that there is no single answer from any data set. There is a wide range of different outcomes, and an understanding of the underlying biology must account for this diversity.

Meta-researcher Fiona Fidler, who studied the studies themselves, said:

The purpose of these surveys is not to scare people or create a crisis. This is to help build our understanding of heterogeneity and what it means for the practice of science. Through meta-research projects like this, we can develop better intuitions about uncertainty and draw better-calibrated conclusions from our research.

What should we do about it?

In our view, the results suggest three courses of action for researchers, publishers, funders and the wider scientific community.

First, we must avoid treating published research as fact. A single scientific paper is just one piece of evidence existing within a larger context of limitations and biases.

The pursuit of “new” science means that the study of something that has already been studied is discouraged, and therefore we inflate the value of individual studies. We need to take a step back and look at each article in context rather than treating them as the final word on the matter.

Read more: The scientific ‘reproducibility crisis’ – and what can be done about it

Second, we need to do more analysis per article and report them all. If the research depends on what analytical choices are made, it makes sense to present multiple analyzes to build a more complete picture of the outcome.

And third, any study should include a description of how the results depend on the data analysis decision. Research publications tend to focus on discussing the ecological implications of their findings, but they should also talk about how different analysis choices affected the results and what this means for the interpretation of the findings.

Leave a Comment

Your email address will not be published. Required fields are marked *