Systematic misestimation of machine learning performance in neuroimaging studies of depression.


Journal

Neuropsychopharmacology : official publication of the American College of Neuropsychopharmacology
ISSN: 1740-634X
Titre abrégé: Neuropsychopharmacology
Pays: England
ID NLM: 8904907

Informations de publication

Date de publication:
07 2021
Historique:
received: 18 01 2021
accepted: 09 04 2021
revised: 01 04 2021
pubmed: 8 5 2021
medline: 29 6 2021
entrez: 7 5 2021
Statut: ppublish

Résumé

We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from Major Depressive Disorder (MDD) and healthy controls based on neuroimaging data. Drawing upon structural MRI data from a balanced sample of N = 1868 MDD patients and healthy controls from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of 61%. Next, we mimicked the process by which researchers would draw samples of various sizes (N = 4 to N = 150) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes (N = 20), we observe accuracies of up to 95%. For medium sample sizes (N = 100) accuracies up to 75% were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.

Identifiants

pubmed: 33958703
doi: 10.1038/s41386-021-01020-7
pii: 10.1038/s41386-021-01020-7
pmc: PMC8209109
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1510-1517

Références

Darcy AM, Louie AK, Roberts LW. Machine learning and the profession of medicine. J Am Med Assoc. 2016;315:551–52.
doi: 10.1001/jama.2015.18421
Eyre HA, Singh AB, Reynolds C. Tech giants enter mental health. World Psychiatry. 2016;15:21–22.
doi: 10.1002/wps.20297
Gabrieli JDE, Ghosh SS, Whitfield-Gabrieli S. Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience. Neuron. 2015;85:11–26.
doi: 10.1016/j.neuron.2014.10.047
Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349:255–60.
doi: 10.1126/science.aaa8415
Hahn T, Nierenberg AA, Whitfield-Gabrieli S. Predictive analytics in mental health: applications, guidelines, challenges and perspectives. Mol Psychiatry. 2017;22:37–43.
doi: 10.1038/mp.2016.201
Johnston BA, Steele JD, Tolomeo S, Christmas D, Matthews K. Structural MRI-based predictions in patients with treatment-refractory depression (TRD). PLoS One. 2015;10:1–16.
doi: 10.1371/journal.pone.0132958
Mwangi B, Ebmeier KP, Matthews K, Douglas Steele J. Multi-centre diagnostic classification of individual structural neuroimaging scans from patients with major depressive disorder. Brain. 2012;135:1508–21.
doi: 10.1093/brain/aws084
Patel MJ, Andreescu C, Price JC, Edelman KL, Reynolds CF, Aizenstein HJ. Machine learning approaches for integrating clinical and imaging features in late-life depression classification and response prediction. Int J Geriatr Psychiatry. 2015;30:1056–67.
doi: 10.1002/gps.4262
Neuhaus AH, Popescu FC. Sample Size, Model Robustness, and Classification Accuracy in Diagnostic Multivariate Neuroimaging Analyses. Biol Psychiatry. 2018;84:e81–e82.
doi: 10.1016/j.biopsych.2017.09.032
Arbabshirani MR, Plis S, Sui J, Calhoun VD. Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls. Neuroimage. 2017;145:137–65.
doi: 10.1016/j.neuroimage.2016.02.079
Raudys S, Jain A. Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners. IEEE Trans Pattern Anal Mach Intell. 1991;13:252–64.
doi: 10.1109/34.75512
van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol. 2014;14:137.
doi: 10.1186/1471-2288-14-137
Kambeitz J, Cabral C, Sacchet MD, Gotlib IH, Zahn R, Serpa MH, et al. Detecting Neuroimaging Biomarkers for Depression: A Meta-analysis of Multivariate Pattern Recognition Studies. Biol Psychiatry. 2017;82:330–38.
doi: 10.1016/j.biopsych.2016.10.028
Varoquaux G, Raamana PR, Engemann DA, Hoyos-Idrobo A, Schwartz Y, Thirion B. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines. Neuroimage. 2017;145:166–79.
doi: 10.1016/j.neuroimage.2016.10.038
Hahn T, Ebner-Priemer U, Meyer-Lindenberg A Transparent Artificial Intelligence – A Conceptual Framework for Evaluating AI-based Clinical Decision Support Systems. OSF Prepr. 2019. 2019. https://doi.org/10.31219/OSF.IO/UZEHJ .
Varoquaux G. Cross-validation failure: small sample sizes lead to large error bars. Neuroimage. 2018;180:68–77.
doi: 10.1016/j.neuroimage.2017.06.061
Dannlowski U, Kugel H, Grotegerd D, Redlich R, Suchy J, Opel N, et al. NCAN cross-disorder risk variant is associated with limbic gray matter deficits in healthy subjects and major depression. Neuropsychopharmacology. 2015;40:2510–16.
doi: 10.1038/npp.2015.86
Dannlowski U, Grabe HJ, Wittfeld K, Klaus J, Konrad C, Grotegerd D, et al. Multimodal imaging of a tescalcin (TESC)-regulating polymorphism (rs7294919)-specific effects on hippocampal gray matter structure. Mol Psychiatry. 2015;20:398–404.
doi: 10.1038/mp.2014.39
Kircher T, Wöhr M, Nenadic I, Schwarting R, Schratt G, Alferink J, et al. Neurobiology of the major psychoses: a translational perspective on brain structure and function—the FOR2107 consortium. Eur Arch Psychiatry Clin Neurosci. 2018:1–14.
Wittchen H-U, Wunderlich U, Gruschwitz S, Zaudig M SKID I. Strukturiertes Klinisches Interview für DSM-IV. Achse I: Psychische Störungen. Interviewheft und Beurteilungsheft. Eine deutschsprachige, erweiterte Bearb. d. amerikanischen Originalversion des SKID I. Göttingen: Hogrefe; 1997.
Vogelbacher C, Möbius TWD, Sommer J, Schuster V, Dannlowski U, Kircher T, et al. The Marburg-Münster Affective Disorders Cohort Study (MACS): A quality assurance protocol for MR neuroimaging data. Neuroimage. 2018;172:450–460.
doi: 10.1016/j.neuroimage.2018.01.079
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2012;12:2825–30.
Marquand AF, Rezek I, Buitelaar J, Beckmann CF. Understanding heterogeneity in clinical cohorts using normative models: beyond case-control studies. Biol Psychiatry. 2016;80:552–61.
doi: 10.1016/j.biopsych.2015.12.023
Schnack HG, Kahn RS. Detecting neuroimaging biomarkers for psychiatric disorders: sample size matters. Front Psychiatry. 2016;7:1–12.
doi: 10.3389/fpsyt.2016.00050
Combrisson E, Jerbi K. Exceeding chance level by chance: the caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy. J Neurosci Methods. 2015;250:126–36.
doi: 10.1016/j.jneumeth.2015.01.010

Auteurs

Claas Flint (C)

Department of Psychiatry, University of Münster, Münster, Germany.
Faculty of Mathematics and Computer Science, University of Münster, Münster, Germany.

Micah Cearns (M)

Discipline of Psychiatry, School of Medicine, University of Adelaide, Adelaide, SA, Australia.
Department of Psychiatry, Melbourne Medical School, The University of Melbourne, Parkville, VIC, Australia.

Nils Opel (N)

Department of Psychiatry, University of Münster, Münster, Germany.

Ronny Redlich (R)

Department of Psychiatry, University of Münster, Münster, Germany.

David M A Mehler (DMA)

Department of Psychiatry, University of Münster, Münster, Germany.

Daniel Emden (D)

Department of Psychiatry, University of Münster, Münster, Germany.

Nils R Winter (NR)

Department of Psychiatry, University of Münster, Münster, Germany.

Ramona Leenings (R)

Department of Psychiatry, University of Münster, Münster, Germany.

Simon B Eickhoff (SB)

Institute of Neuroscience and Medicine (INM-7) Research Center Jülich, Jülich, Germany.
Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.

Tilo Kircher (T)

Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany.

Axel Krug (A)

Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany.

Igor Nenadic (I)

Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany.

Volker Arolt (V)

Department of Psychiatry, University of Münster, Münster, Germany.

Scott Clark (S)

Discipline of Psychiatry, School of Medicine, University of Adelaide, Adelaide, SA, Australia.

Bernhard T Baune (BT)

Department of Psychiatry, University of Münster, Münster, Germany.
Department of Psychiatry, Melbourne Medical School, The University of Melbourne, Parkville, VIC, Australia.
The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Parkville, VIC, Australia.

Xiaoyi Jiang (X)

Faculty of Mathematics and Computer Science, University of Münster, Münster, Germany.

Udo Dannlowski (U)

Department of Psychiatry, University of Münster, Münster, Germany. dannlow@uni-muenster.de.

Tim Hahn (T)

Department of Psychiatry, University of Münster, Münster, Germany.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH