Preprint Review Version 2 Preserved in Portico This version is not peer-reviewed

Why and How We Should Join the Shift From Significance Testing to Estimation

Version 1 : Received: 11 December 2021 / Approved: 14 December 2021 / Online: 14 December 2021 (12:47:07 CET)
Version 2 : Received: 22 March 2022 / Approved: 23 March 2022 / Online: 23 March 2022 (09:30:28 CET)

How to cite: Berner, D.; Amrhein, V. Why and How We Should Join the Shift From Significance Testing to Estimation. Preprints 2021, 2021120235 (doi: 10.20944/preprints202112.0235.v2). Berner, D.; Amrhein, V. Why and How We Should Join the Shift From Significance Testing to Estimation. Preprints 2021, 2021120235 (doi: 10.20944/preprints202112.0235.v2).

Abstract

A paradigm shift away from null hypothesis significance testing seems in progress. Based on simulations, we illustrate some of the underlying motivations. First, P-values vary strongly from study to study, hence dichotomous inference using significance thresholds is usually unjustified. Second, statistically significant results have overestimated effect sizes, a bias declining with increasing statistical power. Third, statistically non-significant results have underestimated effect sizes, and this bias gets stronger with higher statistical power. Fourth, the tested statistical hypotheses usually lack biological justification and are often uninformative. Despite these problems, a screen of 48 papers from the 2020 volume of the Journal of Evolutionary Biology exemplifies that significance testing is still used almost universally in evolutionary biology. All screened studies tested default null hypotheses of zero effect with the default significance threshold of p = 0.05, none presented a pre-specified alternative hypothesis, pre-study power calculation, and the probability of ‘false negatives’(beta error). The results sections of the papers presented 49 significance tests on average (median 23, range 0–390). Of 41 studies that contained verbal descriptions of a ‘statistically non-significant’ result, 26 (63%) falsely claimed the absence of an effect. We conclude that studies in ecology and evolutionary biology are mostly exploratory and descriptive. We should thus shift from claiming to ‘test’ specific hypotheses statistically to describing and discussing many hypotheses (possible true effect sizes) that are most compatible with our data, given our statistical model. We already have the means for doing so, because we routinely present compatibility (‘confidence’) intervals covering these hypotheses.

Keywords

Compatibility interval; effect size; null hypothesis; p-value; statistical inference

Subject

BIOLOGY, Other

Comments (1)

Comment 1
Received: 23 March 2022
Commenter: Valentin Amrhein
Commenter's Conflict of Interests: Author
Comment: This is a revision for the Journal of Evolutionary Biology.
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.