Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports.

BERT COVID-19 Classification Natural language processing (NLP) Radiology Transformer

Journal

Journal of digital imaging
ISSN: 1618-727X
Titre abrégé: J Digit Imaging
Pays: United States
ID NLM: 9100529

Informations de publication

Date de publication:
02 2023
Historique:
received: 05 09 2022
accepted: 03 10 2022
revised: 05 09 2022
pubmed: 4 11 2022
medline: 8 3 2023
entrez: 3 11 2022
Statut: ppublish

Résumé

Building a document-level classifier for COVID-19 on radiology reports could help assist providers in their daily clinical routine, as well as create large numbers of labels for computer vision models. We have developed such a classifier by fine-tuning a BERT-like model initialized from RadBERT, its continuous pre-training on radiology reports that can be used on all radiology-related tasks. RadBERT outperforms all biomedical pre-trainings on this COVID-19 task (P<0.01) and helps our fine-tuned model achieve an 88.9 macro-averaged F1-score, when evaluated on both X-ray and CT reports. To build this model, we rely on a multi-institutional dataset re-sampled and enriched with concurrent lung diseases, helping the model to resist to distribution shifts. In addition, we explore a variety of fine-tuning and hyperparameter optimization techniques that accelerate fine-tuning convergence, stabilize performance, and improve accuracy, especially when data or computational resources are limited. Finally, we provide a set of visualization tools and explainability methods to better understand the performance of the model, and support its practical use in the clinical setting. Our approach offers a ready-to-use COVID-19 classifier and can be applied similarly to other radiology report classification tasks.

Identifiants

pubmed: 36323915
doi: 10.1007/s10278-022-00714-8
pii: 10.1007/s10278-022-00714-8
pmc: PMC9629758
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

164-177

Subventions

Organisme : NIBIB NIH HHS
ID : 75N92020C00008
Pays : United States
Organisme : NIBIB NIH HHS
ID : 75N92020C00021
Pays : United States

Informations de copyright

© 2022. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.

Références

Korean J Pediatr. 2012 Nov;55(11):403-7
pubmed: 23227058
Radiol Artif Intell. 2022 Jun 15;4(4):e210258
pubmed: 35923376
Bioinformatics. 2020 Feb 15;36(4):1234-1240
pubmed: 31501885
Comput Biol Med. 2020 Dec;127:104066
pubmed: 33130435
IEEE Trans Neural Netw Learn Syst. 2022 Mar 14;PP:
pubmed: 35286266

Auteurs

Pierre Chambon (P)

Stanford University, Paris-Saclay University, École Centrale Paris, Stanford, USA. pchambon@stanford.edu.

Tessa S Cook (TS)

University of Pennsylvania, Philadelphia, USA.

Curtis P Langlotz (CP)

Stanford University, Stanford, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH