Inappropriate use of statistical power.
Journal
Bone marrow transplantation
ISSN: 1476-5365
Titre abrégé: Bone Marrow Transplant
Pays: England
ID NLM: 8702459
Informations de publication
Date de publication:
05 2023
05 2023
Historique:
received:
15
06
2021
accepted:
13
06
2022
revised:
25
04
2022
medline:
8
5
2023
pubmed:
4
3
2023
entrez:
3
3
2023
Statut:
ppublish
Résumé
We are pleased to add this typescript, Inappropriate use of statistical power by Raphael Fraser to the BONE MARROW TRANSPLANTATION Statistics Series. The authour discusses how we sometimes misuse statistical analyses after a study is completed and analyzed to explain the results. The most egregious example is post hoc power calculations.When the conclusion of an observational study or clinical trial is negative, namely, the data observed (or more extreme data) fail to reject the null hypothesis, people often argue for calculating the observed statistical power. This is especially true of clinical trialists believing in a new therapy who wished and hoped for a favorable outcome (rejecting the null hypothesis). One is reminded of the saying from Benjamin Franklin: A man convinced against his will is of the same opinion still.As the authour notes, when we face a negative conclusion of a clinical trial there are two possibilities: (1) there is no treatment effect; or (2) we made a mistake. By calculating the observed power after the study, people (incorrectly) believe if the observed power is high there is strong support for the null hypothesis. However, the problem is usually the opposite: if the observed power is low, the null hypothesis was not rejected because there were too few subjects. This is usually couched in terms such as: there was a trend towards… or we failed to detect a benefit because we had too few subjects or the like. Observed power should not be used to interpret results of a negative study. Put more strongly, observed power should not be calculated after a study is completed and analyzed. The power of the study to reject or not the null hypothesis is already incorporated in the calculation of the p value.The authour use interesting analogies to make important points about hypothesis testing. Testing the null hypothesis is like a jury trial. The jury can find the plaintiff guilty or not guilty. They cannot find him innocent. It is always important to recall failure to reject the null hypothesis does not mean the null hypothesis is true, simply there are insufficient evidence (data) to reject it. As the author notes: In a sense, hypothesis testing is like world championship boxing where the null hypothesis is the champion until defeated by the challenger, the alternative hypothesis, to become the new world champion.The authour include a discussion of what is a p-value, a topic we discussed before in this series and elsewhere [1, 2]. Finally, there is a nice discussion of confidence intervals (frequentist) and credibility limits (Bayesian). A frequentist interpretation views probability as the limit of the relative frequency of an event after many trials. In contrast, a Bayesian interpretation views probability in the context of a degree of belief in an event . This belief could be based on prior knowledge such as the results of previous trials, biological plausibility or personal beliefs (my drug is better than your drug). The important point is the common mis-interpretation of confidence intervals. For example, many researchers interpret a 95 percent confidence interval to mean there is a 95 percent chance this interval contains the parameter value. This is wrong. It means, if we repeat the identical study many times 95 percent of the intervals will contain the true but unknown parameter in the population. This will seem strange to many people because we are interested only in the study we are analyzing, not in repeating the same study-design many times.We hope readers will enjoy this well-written summary of common statistical errors, especially post hoc calculations of observed power. Going forth we hope to ban statements like there was a trend towards… or we failed to detect a benefit because we had too few subjects from the Journal. Reviewers have been advised. Proceed at your own risk. Robert Peter Gale MD, PhD, DSc(hc), FACP, FRCP, FRCPI(hon), FRSM, Imperial College London, Mei-Jie Zhang PhD, Medical College of Wisconsin.
Identifiants
pubmed: 36869191
doi: 10.1038/s41409-023-01935-3
pii: 10.1038/s41409-023-01935-3
doi:
Types de publication
Observational Study
Editorial
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
474-477Subventions
Organisme : NCI NIH HHS
ID : U24 CA076518
Pays : United States
Informations de copyright
© 2023. The Author(s), under exclusive licence to Springer Nature Limited.
Références
Gale RP, Zhang MJ. What is the P-value anyway? Bone Marrow Transplant. 2016;51:1439–40.
doi: 10.1038/bmt.2016.184
pubmed: 27400067
pmcid: 5093047
Gale RP, Hochhaus A, Zhang MJ. What is the (P-) value of the P-value? Leukemia. 2016;30:1965–7.
doi: 10.1038/leu.2016.193
pubmed: 27562408
Neyman J, Pearson ESIX. On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Lond. Series A, Contain Pap Math Phys Character. 1933;231:694–706.
Cohen J. The statistical power of abnormal-social psychological research: a review. J Abnorm Soc Psychol. 1962;65:145.
doi: 10.1037/h0045186
pubmed: 13880271
Cox DR. Some problems connected with statistical inference. Ann Math Statist. 1958;29:357–72.
doi: 10.1214/aoms/1177706618
Zumbo DB, Hubley AM. A note on misconceptions concerning prospective and retrospective power. J R Stat Soc: Series D. 1998;47:385–88.
Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat. 2001;55:19–24.
doi: 10.1198/000313001300339897
Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, et al. 2001. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med. 2002;134:663–94.
doi: 10.7326/0003-4819-134-8-200104170-00012
Senn SJ. Power is indeed irrelevant in interpreting completed studies. BMJ. 2002;325:1304.
doi: 10.1136/bmj.325.7375.1304
pubmed: 12458264
pmcid: 1124761
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–50.
doi: 10.1007/s10654-016-0149-3
pubmed: 27209009
pmcid: 4877414
Gale RP, Zhang MJ. What’s the p-value anyway? Bone Marrow Transplant. 2016;51:1439.
doi: 10.1038/bmt.2016.184
pubmed: 27400067
pmcid: 5093047
Greenland S. Nonsignificance plus high power does not imply support for the null over the alternative. Ann Epidemiol. 2012;22:364–68.
doi: 10.1016/j.annepidem.2012.02.007
pubmed: 22391267
Zhang Y, Hedo R, Rivera A, Rull R, Richardson S, Tu XM. Post hoc power analysis: is it an informative and meaningful analysis? Gen Psychiatry. 2019;32:4.