Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls.

Convolutional Neural Network (CNN) Deep Learning Diagnosis Generalizability Machine Learning Medical Image Analysis Model Evaluation Prognosis Random Forest

Journal

Radiology. Artificial intelligence

ISSN: 2638-6100

Titre abrégé: Radiol Artif Intell

Pays: United States

ID NLM: 101746556

Informations de publication

Date de publication:
Jan 2023

Historique:

received: 13 02 2022

revised: 10 10 2022

accepted: 24 10 2022

entrez: 1 2 2023

pubmed: 2 2 2023

medline: 2 2 2023

Statut: epublish

Résumé

To investigate the impact of the following three methodological pitfalls on model generalizability: ( The authors used retrospective CT, histopathologic analysis, and radiography datasets to develop machine learning models with and without the three methodological pitfalls to quantitatively illustrate their effect on model performance and generalizability. F1 score was used to measure performance, and differences in performance between models developed with and without errors were assessed using the Wilcoxon rank sum test when applicable. Violation of the independence assumption by applying oversampling, feature selection, and data augmentation before splitting data into training, validation, and test sets seemingly improved model F1 scores by 71.2% for predicting local recurrence and 5.0% for predicting 3-year overall survival in head and neck cancer and by 46.0% for distinguishing histopathologic patterns in lung cancer. Randomly distributing data points for a patient across datasets superficially improved the F1 score by 21.8%. High model performance metrics did not indicate high-quality lung segmentation. In the presence of a batch effect, a model built for pneumonia detection had an F1 score of 98.7% but correctly classified only 3.86% of samples from a new dataset of healthy patients. Machine learning models developed with these methodological pitfalls, which are undetectable during internal evaluation, produce inaccurate predictions; thus, understanding and avoiding these pitfalls is necessary for developing generalizable models.

Identifiants

DOI: 10.1148/ryai.220028 PMID: 36721408 PMC: PMC9885377

pubmed: 36721408

doi: 10.1148/ryai.220028

pmc: PMC9885377

doi:

Types de publication

Journal Article

Langues

eng

Pagination

e220028

Informations de copyright

Déclaration de conflit d'intérêts

Disclosures of conflicts of interest: F.M. No relevant relationships. K.O. No relevant relationships. R.G. No relevant relationships. C.R. Grants or contracts from Imagia. A.S. No relevant relationships. R.F. Research grant from McGill University Health Centre Foundation, TD Bank, GE Healthcare, and Intel for Artificial Intelligence for Urgent Care: Providing Better Care to Remote Communities, payments made to institution (Research Institute of the McGill University Health Centre); Canadian Cancer Society/Canadian Institutes of Health Research/Brain Canada Foundation Spark Grant: Novel Technology Applications in Cancer Prevention and Early Detection (SPARK-21), research grant, payments made to institution (McGill); Fonds de recherche en santé du Québec and Fondation de l’Association des radiologistes du Québec grant providing research salary support and time (payments to author) and general research operating grant (payments to institution), not directly funding this work.

Références

Ann Intern Med. 2011 Oct 18;155(8):529-36

pubmed: 22007046

Front Neurosci. 2020 Jan 24;13:1449

pubmed: 32038146

BMC Med. 2019 Oct 29;17(1):195

pubmed: 31665002

Radiology. 2013 Nov;269(2):451-9

pubmed: 23824993

Radiol Artif Intell. 2019 May 29;1(3):e180084

pubmed: 33937792

Radiol Artif Intell. 2020 Mar 25;2(2):e200029

pubmed: 33937821

Cancer Res. 2017 Nov 1;77(21):e104-e107

pubmed: 29092951

Radiol Artif Intell. 2020 Jan 22;2(1):e190015

pubmed: 33937810

Nat Commun. 2014 Jun 03;5:4006

pubmed: 24892406

BMC Med Imaging. 2020 Sep 22;20(1):109

pubmed: 32962651

Cell. 2018 Feb 22;172(5):1122-1131.e9

pubmed: 29474911

Med Phys. 2018 Oct;45(10):4568-4581

pubmed: 30144101

Radiol Artif Intell. 2021 Feb 03;3(3):e200166

pubmed: 34142089

J Digit Imaging. 2013 Dec;26(6):1045-57

pubmed: 23884657

Ann Intern Med. 2019 Jan 1;170(1):51-58

pubmed: 30596875

Radiol Artif Intell. 2020 Sep 16;2(5):e190217

pubmed: 33937840

Radiol Artif Intell. 2020 May 27;2(3):e180063

pubmed: 33937822

Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:151-159

pubmed: 33196064

BMC Med. 2015 Jan 06;13:1

pubmed: 25563062

Radiol Artif Intell. 2020 Jan 22;2(1):e190177

pubmed: 33939779

Sci Rep. 2019 Mar 4;9(1):3358

pubmed: 30833650

Radiol Artif Intell. 2019 Nov 27;1(6):e190045

pubmed: 32090206

Nat Med. 2021 May;27(5):775-784

pubmed: 33990804

Eur Radiol. 2021 Mar;31(3):1460-1470

pubmed: 32909055

Radiol Artif Intell. 2019 Nov 27;1(6):e180069

pubmed: 32090204

Radiol Artif Intell. 2022 Mar 02;4(3):e210110

pubmed: 35652113

Lancet Digit Health. 2020 Sep;2(9):e489-e492

pubmed: 32864600

Neuroimaging Clin N Am. 2020 Nov;30(4):417-431

pubmed: 33038993

Nat Rev Clin Oncol. 2017 Dec;14(12):749-762

pubmed: 28975929

Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Références

Auteurs

Farhad Maleki (F)

Katie Ovens (K)

Rajiv Gupta (R)

Caroline Reinhold (C)

Alan Spatz (A)

Reza Forghani (R)

Classifications MeSH