Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports.
BERT
COVID-19
Classification
Natural language processing (NLP)
Radiology
Transformer
Journal
Journal of digital imaging
ISSN: 1618-727X
Titre abrégé: J Digit Imaging
Pays: United States
ID NLM: 9100529
Informations de publication
Date de publication:
02 2023
02 2023
Historique:
received:
05
09
2022
accepted:
03
10
2022
revised:
05
09
2022
pubmed:
4
11
2022
medline:
8
3
2023
entrez:
3
11
2022
Statut:
ppublish
Résumé
Building a document-level classifier for COVID-19 on radiology reports could help assist providers in their daily clinical routine, as well as create large numbers of labels for computer vision models. We have developed such a classifier by fine-tuning a BERT-like model initialized from RadBERT, its continuous pre-training on radiology reports that can be used on all radiology-related tasks. RadBERT outperforms all biomedical pre-trainings on this COVID-19 task (P<0.01) and helps our fine-tuned model achieve an 88.9 macro-averaged F1-score, when evaluated on both X-ray and CT reports. To build this model, we rely on a multi-institutional dataset re-sampled and enriched with concurrent lung diseases, helping the model to resist to distribution shifts. In addition, we explore a variety of fine-tuning and hyperparameter optimization techniques that accelerate fine-tuning convergence, stabilize performance, and improve accuracy, especially when data or computational resources are limited. Finally, we provide a set of visualization tools and explainability methods to better understand the performance of the model, and support its practical use in the clinical setting. Our approach offers a ready-to-use COVID-19 classifier and can be applied similarly to other radiology report classification tasks.
Identifiants
pubmed: 36323915
doi: 10.1007/s10278-022-00714-8
pii: 10.1007/s10278-022-00714-8
pmc: PMC9629758
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
164-177Subventions
Organisme : NIBIB NIH HHS
ID : 75N92020C00008
Pays : United States
Organisme : NIBIB NIH HHS
ID : 75N92020C00021
Pays : United States
Informations de copyright
© 2022. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.
Références
Korean J Pediatr. 2012 Nov;55(11):403-7
pubmed: 23227058
Radiol Artif Intell. 2022 Jun 15;4(4):e210258
pubmed: 35923376
Bioinformatics. 2020 Feb 15;36(4):1234-1240
pubmed: 31501885
Comput Biol Med. 2020 Dec;127:104066
pubmed: 33130435
IEEE Trans Neural Netw Learn Syst. 2022 Mar 14;PP:
pubmed: 35286266