Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.
Aged
Cohort Studies
Electronic Health Records
Female
Heart Failure
/ diagnostic imaging
Humans
Image Processing, Computer-Assisted
/ methods
Machine Learning
Male
Natural Language Processing
Neural Networks, Computer
Prognosis
Radiography, Abdominal
/ methods
Radiography, Thoracic
/ methods
Survival Rate
Tomography, X-Ray Computed
/ methods
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2020
2020
Historique:
received:
06
03
2020
accepted:
14
07
2020
entrez:
31
7
2020
pubmed:
31
7
2020
medline:
25
9
2020
Statut:
epublish
Résumé
Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming. We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients. This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features. 11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83-1.00 (F1 score 0.84-0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704-0.790) at 30 days. An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients.
Sections du résumé
BACKGROUND
Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming.
PURPOSE
We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients.
MATERIALS AND METHODS
This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features.
RESULTS
11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83-1.00 (F1 score 0.84-0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704-0.790) at 30 days.
CONCLUSION
An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients.
Identifiants
pubmed: 32730362
doi: 10.1371/journal.pone.0236827
pii: PONE-D-20-06619
pmc: PMC7392233
doi:
Types de publication
Journal Article
Observational Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0236827Subventions
Organisme : NCATS NIH HHS
ID : UL1 TR002384
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR000457
Pays : United States
Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist. Gurpreet Singh is currently employed at GlaxoSmithKline but was not a part of GlaxoSmithKline during the conduct of this study. Gabriel Maliakal and James K. Min are currently employed at Cleerly Inc. but were not a part of Cleerly Inc. during the conduct of this study. Mohit Pandey is currently employed at Ipsos but was not a part of Ipsos during the conduct of this study. These commercial affiliations do not alter our adherence to PLOS ONE policies on sharing data and materials.
Références
BMC Med Inform Decis Mak. 2018 Mar 22;18(Suppl 1):14
pubmed: 29589569
Am Heart J. 2007 Apr;153(4):666-73
pubmed: 17383310
J Biomed Inform. 2017 Aug;72:85-95
pubmed: 28694119
J Biomed Inform. 2001 Oct;34(5):301-10
pubmed: 12123149
Circ Cardiovasc Qual Outcomes. 2008 Sep;1(1):29-37
pubmed: 20031785
Stat Med. 2013 Dec 30;32(30):5381-97
pubmed: 24027076
Yearb Med Inform. 2008;:128-44
pubmed: 18660887
Yearb Med Inform. 2015 Aug 13;10(1):183-93
pubmed: 26293867
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
Circulation. 2019 Mar 5;139(10):e56-e528
pubmed: 30700139
J Biomed Inform. 2017 Sep;73:14-29
pubmed: 28729030
Stat Med. 1999 Sep 15-30;18(17-18):2529-45
pubmed: 10474158
PLoS One. 2015 Jul 10;10(7):e0130140
pubmed: 26161953
Circ Cardiovasc Qual Outcomes. 2011 Jan 1;4(1):60-7
pubmed: 21139093
J Biomed Inform. 2018 Jan;77:34-49
pubmed: 29162496
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196
pubmed: 29888070
JAMA Intern Med. 2013 Apr 22;173(8):632-8
pubmed: 23529115
J Stat Softw. 2012 Sep;50(11):1-23
pubmed: 25317082