Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls.
Convolutional Neural Network (CNN)
Deep Learning
Diagnosis
Generalizability
Machine Learning
Medical Image Analysis
Model Evaluation
Prognosis
Random Forest
Journal
Radiology. Artificial intelligence
ISSN: 2638-6100
Titre abrégé: Radiol Artif Intell
Pays: United States
ID NLM: 101746556
Informations de publication
Date de publication:
Jan 2023
Jan 2023
Historique:
received:
13
02
2022
revised:
10
10
2022
accepted:
24
10
2022
entrez:
1
2
2023
pubmed:
2
2
2023
medline:
2
2
2023
Statut:
epublish
Résumé
To investigate the impact of the following three methodological pitfalls on model generalizability: ( The authors used retrospective CT, histopathologic analysis, and radiography datasets to develop machine learning models with and without the three methodological pitfalls to quantitatively illustrate their effect on model performance and generalizability. F1 score was used to measure performance, and differences in performance between models developed with and without errors were assessed using the Wilcoxon rank sum test when applicable. Violation of the independence assumption by applying oversampling, feature selection, and data augmentation before splitting data into training, validation, and test sets seemingly improved model F1 scores by 71.2% for predicting local recurrence and 5.0% for predicting 3-year overall survival in head and neck cancer and by 46.0% for distinguishing histopathologic patterns in lung cancer. Randomly distributing data points for a patient across datasets superficially improved the F1 score by 21.8%. High model performance metrics did not indicate high-quality lung segmentation. In the presence of a batch effect, a model built for pneumonia detection had an F1 score of 98.7% but correctly classified only 3.86% of samples from a new dataset of healthy patients. Machine learning models developed with these methodological pitfalls, which are undetectable during internal evaluation, produce inaccurate predictions; thus, understanding and avoiding these pitfalls is necessary for developing generalizable models.
Identifiants
pubmed: 36721408
doi: 10.1148/ryai.220028
pmc: PMC9885377
doi:
Types de publication
Journal Article
Langues
eng
Pagination
e220028Informations de copyright
© 2022 by the Radiological Society of North America, Inc.
Déclaration de conflit d'intérêts
Disclosures of conflicts of interest: F.M. No relevant relationships. K.O. No relevant relationships. R.G. No relevant relationships. C.R. Grants or contracts from Imagia. A.S. No relevant relationships. R.F. Research grant from McGill University Health Centre Foundation, TD Bank, GE Healthcare, and Intel for Artificial Intelligence for Urgent Care: Providing Better Care to Remote Communities, payments made to institution (Research Institute of the McGill University Health Centre); Canadian Cancer Society/Canadian Institutes of Health Research/Brain Canada Foundation Spark Grant: Novel Technology Applications in Cancer Prevention and Early Detection (SPARK-21), research grant, payments made to institution (McGill); Fonds de recherche en santé du Québec and Fondation de l’Association des radiologistes du Québec grant providing research salary support and time (payments to author) and general research operating grant (payments to institution), not directly funding this work.
Références
Ann Intern Med. 2011 Oct 18;155(8):529-36
pubmed: 22007046
Front Neurosci. 2020 Jan 24;13:1449
pubmed: 32038146
BMC Med. 2019 Oct 29;17(1):195
pubmed: 31665002
Radiology. 2013 Nov;269(2):451-9
pubmed: 23824993
Radiol Artif Intell. 2019 May 29;1(3):e180084
pubmed: 33937792
Radiol Artif Intell. 2020 Mar 25;2(2):e200029
pubmed: 33937821
Cancer Res. 2017 Nov 1;77(21):e104-e107
pubmed: 29092951
Radiol Artif Intell. 2020 Jan 22;2(1):e190015
pubmed: 33937810
Nat Commun. 2014 Jun 03;5:4006
pubmed: 24892406
BMC Med Imaging. 2020 Sep 22;20(1):109
pubmed: 32962651
Cell. 2018 Feb 22;172(5):1122-1131.e9
pubmed: 29474911
Med Phys. 2018 Oct;45(10):4568-4581
pubmed: 30144101
Radiol Artif Intell. 2021 Feb 03;3(3):e200166
pubmed: 34142089
J Digit Imaging. 2013 Dec;26(6):1045-57
pubmed: 23884657
Ann Intern Med. 2019 Jan 1;170(1):51-58
pubmed: 30596875
Radiol Artif Intell. 2020 Sep 16;2(5):e190217
pubmed: 33937840
Radiol Artif Intell. 2020 May 27;2(3):e180063
pubmed: 33937822
Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:151-159
pubmed: 33196064
BMC Med. 2015 Jan 06;13:1
pubmed: 25563062
Radiol Artif Intell. 2020 Jan 22;2(1):e190177
pubmed: 33939779
Sci Rep. 2019 Mar 4;9(1):3358
pubmed: 30833650
Radiol Artif Intell. 2019 Nov 27;1(6):e190045
pubmed: 32090206
Nat Med. 2021 May;27(5):775-784
pubmed: 33990804
Eur Radiol. 2021 Mar;31(3):1460-1470
pubmed: 32909055
Radiol Artif Intell. 2019 Nov 27;1(6):e180069
pubmed: 32090204
Radiol Artif Intell. 2022 Mar 02;4(3):e210110
pubmed: 35652113
Lancet Digit Health. 2020 Sep;2(9):e489-e492
pubmed: 32864600
Neuroimaging Clin N Am. 2020 Nov;30(4):417-431
pubmed: 33038993
Nat Rev Clin Oncol. 2017 Dec;14(12):749-762
pubmed: 28975929