Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.

Aged Cohort Studies Electronic Health Records Female Heart Failure / diagnostic imaging Humans Image Processing, Computer-Assisted / methods Machine Learning Male Natural Language Processing Neural Networks, Computer Prognosis Radiography, Abdominal / methods Radiography, Thoracic / methods Survival Rate Tomography, X-Ray Computed / methods

Journal

PloS one

ISSN: 1932-6203

Titre abrégé: PLoS One

Pays: United States

ID NLM: 101285081

Informations de publication

Date de publication:
2020

Historique:

received: 06 03 2020

accepted: 14 07 2020

entrez: 31 7 2020

pubmed: 31 7 2020

medline: 25 9 2020

Statut: epublish

Résumé

Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming. We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients. This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features. 11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83-1.00 (F1 score 0.84-0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704-0.790) at 30 days. An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients.

Sections du résumé

BACKGROUND

PURPOSE

We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients.

MATERIALS AND METHODS

This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features.

RESULTS

11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83-1.00 (F1 score 0.84-0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704-0.790) at 30 days.

CONCLUSION

An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients.

Identifiants

DOI: 10.1371/journal.pone.0236827 PMID: 32730362 PMC: PMC7392233

pubmed: 32730362

doi: 10.1371/journal.pone.0236827

pii: PONE-D-20-06619

pmc: PMC7392233

doi:

Types de publication

Journal Article Observational Study Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

e0236827

Subventions

Organisme : NCATS NIH HHS

ID : UL1 TR002384

Pays : United States

Organisme : NCATS NIH HHS

ID : UL1 TR000457

Pays : United States

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist. Gurpreet Singh is currently employed at GlaxoSmithKline but was not a part of GlaxoSmithKline during the conduct of this study. Gabriel Maliakal and James K. Min are currently employed at Cleerly Inc. but were not a part of Cleerly Inc. during the conduct of this study. Mohit Pandey is currently employed at Ipsos but was not a part of Ipsos during the conduct of this study. These commercial affiliations do not alter our adherence to PLOS ONE policies on sharing data and materials.

Références

BMC Med Inform Decis Mak. 2018 Mar 22;18(Suppl 1):14

pubmed: 29589569

Am Heart J. 2007 Apr;153(4):666-73

pubmed: 17383310

J Biomed Inform. 2017 Aug;72:85-95

pubmed: 28694119

J Biomed Inform. 2001 Oct;34(5):301-10

pubmed: 12123149

Circ Cardiovasc Qual Outcomes. 2008 Sep;1(1):29-37

pubmed: 20031785

Stat Med. 2013 Dec 30;32(30):5381-97

pubmed: 24027076

Yearb Med Inform. 2008;:128-44

pubmed: 18660887

Yearb Med Inform. 2015 Aug 13;10(1):183-93

pubmed: 26293867

Neural Comput. 1997 Nov 15;9(8):1735-80

pubmed: 9377276

Circulation. 2019 Mar 5;139(10):e56-e528

pubmed: 30700139

J Biomed Inform. 2017 Sep;73:14-29

pubmed: 28729030

Stat Med. 1999 Sep 15-30;18(17-18):2529-45

pubmed: 10474158

PLoS One. 2015 Jul 10;10(7):e0130140

pubmed: 26161953

Circ Cardiovasc Qual Outcomes. 2011 Jan 1;4(1):60-7

pubmed: 21139093

J Biomed Inform. 2018 Jan;77:34-49

pubmed: 29162496

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196

pubmed: 29888070

JAMA Intern Med. 2013 Apr 22;173(8):632-8

pubmed: 23529115

J Stat Softw. 2012 Sep;50(11):1-23

pubmed: 25317082

Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Déclaration de conflit d'intérêts

Références

Auteurs

Mohit Pandey (M)

Zhuoran Xu (Z)

Evan Sholle (E)

Gabriel Maliakal (G)

Gurpreet Singh (G)

Zahra Fatima (Z)

Daria Larine (D)

Benjamin C Lee (BC)

Jing Wang (J)

Alexander R van Rosendael (AR)

Lohendran Baskaran (L)

Leslee J Shaw (LJ)

James K Min (JK)

Subhi J Al'Aref (SJ)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH