Efficient automated error detection in medical data using deep-learning and label-clustering.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
09 11 2023
Historique:
received: 15 06 2023
accepted: 26 10 2023
medline: 13 11 2023
pubmed: 11 11 2023
entrez: 10 11 2023
Statut: epublish

Résumé

Medical datasets inherently contain errors from subjective or inaccurate test results, or from confounding biological complexities. It is difficult for medical experts to detect these elusive errors manually, due to lack of contextual information, limiting data privacy regulations, and the sheer scale of data to be reviewed. Current methods for training robust artificial intelligence (AI) models on data containing mislabeled examples generally fall into one of several categories-attempting to improve the robustness of the model architecture, the regularization techniques used, the loss function used during training, or selecting a subset of data that contains cleaner labels. This last category requires the ability to efficiently detect errors either prior to or during training, either relabeling them or removing them completely. More recent progress in error detection has focused on using multi-network learning to minimize deleterious effects of errors on training, however, using many neural networks to reach a consensus on which data should be removed can be computationally intensive and inefficient. In this work, a deep-learning based algorithm was used in conjunction with a label-clustering approach to automate error detection. For dataset with synthetic label flips added, these errors were identified with an accuracy of up to 85%, while requiring up to 93% less computing resources to complete compared to a previous model consensus approach developed previously. The resulting trained AI models exhibited greater training stability and up to a 45% improvement in accuracy, from 69 to over 99% compared to the consensus approach, at least 10% improvement on using noise-robust loss functions in a binary classification problem, and a 51% improvement for multi-class classification. These results indicate that practical, automated a priori detection of errors in medical data is possible, without human oversight.

Identifiants

pubmed: 37949906
doi: 10.1038/s41598-023-45946-y
pii: 10.1038/s41598-023-45946-y
pmc: PMC10638377
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

19587

Informations de copyright

© 2023. The Author(s).

Références

Am Soc Clin Oncol Educ Book. 2018 May 23;38:188-196
pubmed: 30231363
Sci Rep. 2021 Sep 9;11(1):18005
pubmed: 34504205
Ann N Y Acad Sci. 2004 Dec;1034:132-44
pubmed: 15731306
Int J Gynaecol Obstet. 2005 Jun;89(3):258-62
pubmed: 15919392
IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8135-8153
pubmed: 35254993
IEEE Rev Biomed Eng. 2023;16:53-69
pubmed: 36269930
Med Image Anal. 2020 Oct;65:101759
pubmed: 32623277
Annu Int Conf IEEE Eng Med Biol Soc. 2015;2015:2608-11
pubmed: 26736826
IEEE Trans Neural Netw Learn Syst. 2014 May;25(5):845-69
pubmed: 24808033
Hum Reprod. 2020 Apr 28;35(4):770-784
pubmed: 32240301
PeerJ. 2019 Oct 4;7:e7702
pubmed: 31592346
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149
pubmed: 27295650
Hum Reprod. 2022 Jul 30;37(8):1746-1759
pubmed: 35674312
Sci Rep. 2022 May 25;12(1):8888
pubmed: 35614106
Indian J Orthop. 2007 Jan;41(1):27-31
pubmed: 21124679
Reprod Biomed Online. 2022 Dec;45(6):1105-1117
pubmed: 36117079

Auteurs

T V Nguyen (TV)

Presagen, Adelaide, SA, 5000, Australia. tuc@presagen.com.
School of Computing and Information Technology, University of Wollongong, Wollongong, NSW, 2522, Australia. tuc@presagen.com.

S M Diakiw (SM)

Presagen, Adelaide, SA, 5000, Australia.

M D VerMilyea (MD)

Ovation Fertility, Austin, TX, 78731, USA.
Texas Fertility Center, Austin, TX, 78731, USA.

A W Dinsmore (AW)

California Fertility Partners, Los Angeles, CA, 90025, USA.

M Perugini (M)

Presagen, Adelaide, SA, 5000, Australia.
Adelaide Medical School, The University of Adelaide, Adelaide, SA, 5000, Australia.

D Perugini (D)

Presagen, Adelaide, SA, 5000, Australia.

J M M Hall (JMM)

Presagen, Adelaide, SA, 5000, Australia.
Australian Research Council Centre of Excellence for Nanoscale BioPhotonics, Adelaide, SA, 5005, Australia.
School of Physical Sciences, The University of Adelaide, Adelaide, SA, 5005, Australia.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH