Successful Development of a Natural Language Processing Algorithm for Pancreatic Neoplasms and Associated Histologic Features.
Journal
Pancreas
ISSN: 1536-4828
Titre abrégé: Pancreas
Pays: United States
ID NLM: 8608542
Informations de publication
Date de publication:
01 Apr 2023
01 Apr 2023
Historique:
medline:
20
11
2023
pubmed:
16
9
2023
entrez:
16
9
2023
Statut:
ppublish
Résumé
Natural language processing (NLP) algorithms can interpret unstructured text for commonly used terms and phrases. Pancreatic pathologies are diverse and include benign and malignant entities with associated histologic features. Creating a pancreas NLP algorithm can aid in electronic health record coding as well as large database creation and curation. Text-based pancreatic anatomic and cytopathologic reports for pancreatic cancer, pancreatic ductal adenocarcinoma, neuroendocrine tumor, intraductal papillary neoplasm, tumor dysplasia, and suspicious findings were collected. This dataset was split 80/20 for model training and development. A separate set was held out for testing purposes. We trained using convolutional neural network to predict each heading. Over 14,000 reports were obtained from the Mass General Brigham Healthcare System electronic record. Of these, 1252 reports were used for algorithm development. Final accuracy and F1 scores relative to the test set ranged from 95% and 98% for each queried pathology. To understand the dependence of our results to training set size, we also generated learning curves. Scoring metrics improved as more reports were submitted for training; however, some queries had high index performance. Natural language processing algorithms can be used for pancreatic pathologies. Increased training volume, nonoverlapping terminology, and conserved text structure improve NLP algorithm performance.
Identifiants
pubmed: 37716007
doi: 10.1097/MPA.0000000000002242
pii: 00006676-990000000-00050
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e219-e223Informations de copyright
Copyright © 2023 Wolters Kluwer Health, Inc. All rights reserved.
Déclaration de conflit d'intérêts
The authors have no conflicts of interest to disclose.
Références
American Cancer Society. Cancer Facts & Figures 2021 . Atlanta: American Cancer Society; 2021.
Rahib L, Smith BD, Aizenberg R, et al. Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the United States. Cancer Res . 2014;74:2913–2921.
Mostafa ME, Erbarut-Seven I, Pehlivanoglu B, et al. Pathologic classification of “pancreatic cancers”: current concepts and challenges. Chin Clin Oncol . 2017;6:59.
Tanaka M, Fernández-Del Castillo C, Kamisawa T, et al. Revisions of international consensus Fukuoka guidelines for the management of IPMN of the pancreas. Pancreatology . 2017;17:738–753.
Kim TS, Fernandez-del Castillo C. Diagnosis and management of pancreatic cystic neoplasms. Hematol Oncol Clin North Am . 2015;29:655–674.
Stark A, Donahue TR, Reber HA, et al. Pancreatic cyst disease: a review. JAMA . 2016;315:1882–1893.
Brugge WR. Diagnosis and management of cystic lesions of the pancreas. J Gastrointest Oncol . 2015;6:375–388.
Yala A, Barzilay R, Salama L, et al. Using machine learning to parse breast pathology reports. Breast Cancer Res Treat . 2017;161:203–211.
Forsyth AW, Barzilay R, Hughes KS, et al. Machine learning methods to extract documentation of breast cancer symptoms from electronic health records. J Pain Symptom Manage . 2018;55:1492–1499.
Tang R, Ouyang L, Li C, et al. Machine learning to parse breast pathology reports in Chinese. Breast Cancer Res Treat . 2018;169:243–250.
Buckley JM, Coopey SB, Sharko J, et al. The feasibility of using natural language processing to extract clinical information from breast pathology reports. J Pathol Inform . 2012;3:23.
Hughes KS, Zhou J, Bao Y, et al. Natural language processing to facilitate breast cancer research and management. Breast J . 2020;26:92–99.
Thomas AA, Zheng C, Jung H, et al. Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results. World J Urol . 2014;32:99–103.
Leyh-Bannurah SR, Tian Z, Karakiewicz PI, et al. Deep learning for natural language processing in urology: state-of-the-art automated extraction of detailed pathologic prostate Cancer data from narratively written electronic health records. JCO Clin Cancer Inform . 2018;2:1–9.
Zeng J, Banerjee I, Henry AS, et al. Natural language processing to identify cancer treatments with electronic medical records. JCO Clin Cancer Inform . 2021;5:379–393.
Kim BJ, Merchant M, Zheng C, et al. A natural language processing program effectively extracts key pathologic findings from radical prostatectomy reports. J Endourol . 2014;28:1474–1478.
Datta S, Bernstam EV, Roberts K. A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. J Biomed Inform . 2019;100:103301.
Sheikhalishahi S, Miotto R, Dudley JT, et al. Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med Inform . 2019;7:e12239.
Savova GK, Danciu I, Alamudun F, et al. Use of natural language processing to extract clinical cancer phenotypes from electronic medical records. Cancer Res . 2019;79:5463–5470.
Ford E, Carroll JA, Smith HE, et al. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc . 2016;23:1007–1015.
Yim WW, Yetisgen M, Harris WP, et al. Natural language processing in oncology: a review. JAMA Oncol . 2016;2:797–804.
Jang DK, Song BJ, Ryu JK, et al. Preoperative diagnosis of pancreatic cystic lesions: the accuracy of endoscopic ultrasound and cross-sectional imaging. Pancreas . 2015;44:1329–1333.
Friedlin J, Overhage M, Al-Haddad MA, et al. Comparing methods for identifying pancreatic cancer patients using electronic data sources. AMIA Annu Symp Proc . 2010;2010:237–241.
Roch AM, Mehrabi S, Krishnan A, et al. Automated pancreatic cyst screening using natural language processing: a new tool in the early detection of pancreatic cancer. HPB (Oxford) . 2015;17:447–453.
Al-Haddad MA, Friedlin J, Kesterson J, et al. Natural language processing for the development of a clinical registry: a validation study in intraductal papillary mucinous neoplasms. HPB (Oxford) . 2010;12:688–695.
Mehrabi S, Krishnan A, Roch AM, et al. Identification of patients with family history of pancreatic cancer—investigation of an NLP system portability. Stud Health Technol Inform . 2015;216:604–608.
Xie F, Chen Q, Zhou Y, et al. Characterization of patients with advanced chronic pancreatitis using natural language processing of radiology reports. PloS One . 2020;15:e0236817.
Hicks SA, Strümke I, Thambawita V, et al. On evaluation metrics for medical applications of artificial intelligence. Sci Rep . 2022;12:5979.
Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) . Minneapolis, MN: Association for Computational Linguistics; 2019:4171–4186.
Alsentzer E, Murphy J, Boag W, et al. Publicly available clinical BERT Embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop . Minneapolis, MN: Association for Computational Linguistics; 2019:72–78.