Scalable incident detection via natural language processing and probabilistic language models.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
08 10 2024
Historique:
received: 21 12 2023
accepted: 10 09 2024
medline: 9 10 2024
pubmed: 9 10 2024
entrez: 8 10 2024
Statut: epublish

Résumé

Post marketing safety surveillance depends in part on the ability to detect concerning clinical events at scale. Spontaneous reporting might be an effective component of safety surveillance, but it requires awareness and understanding among healthcare professionals to achieve its potential. Reliance on readily available structured data such as diagnostic codes risks under-coding and imprecision. Clinical textual data might bridge these gaps, and natural language processing (NLP) has been shown to aid in scalable phenotyping across healthcare records in multiple clinical domains. In this study, we developed and validated a novel incident phenotyping approach using unstructured clinical textual data agnostic to Electronic Health Record (EHR) and note type. It's based on a published, validated approach (PheRe) used to ascertain social determinants of health and suicidality across entire healthcare records. To demonstrate generalizability, we validated this approach on two separate phenotypes that share common challenges with respect to accurate ascertainment: (1) suicide attempt; (2) sleep-related behaviors. With samples of 89,428 records and 35,863 records for suicide attempt and sleep-related behaviors, respectively, we conducted silver standard (diagnostic coding) and gold standard (manual chart review) validation. We showed Area Under the Precision-Recall Curve of ~ 0.77 (95% CI 0.75-0.78) for suicide attempt and AUPR ~ 0.31 (95% CI 0.28-0.34) for sleep-related behaviors. We also evaluated performance by coded race and demonstrated differences in performance by race differed across phenotypes. Scalable phenotyping models, like most healthcare AI, require algorithmovigilance and debiasing prior to implementation.

Identifiants

pubmed: 39379449
doi: 10.1038/s41598-024-72756-7
pii: 10.1038/s41598-024-72756-7
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

23429

Subventions

Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : U.S. Food and Drug Administration
ID : WO2006
Organisme : National Institute of Mental Health,United States
ID : R01MH121455
Organisme : NIMH NIH HHS
ID : R01MH116269
Pays : United States
Organisme : Wellcome Leap
ID : MCPsych

Informations de copyright

© 2024. The Author(s).

Références

Ball, R., Robb, M. & Anderson, S. Dal Pan, G. The FDA’s sentinel initiative—A comprehensive approach to medical product surveillance. Clin. Pharmacol. Ther. 99, 265–268 (2016).
doi: 10.1002/cpt.320 pubmed: 26667601
Behrman, R. E. et al. Developing the Sentinel System — A National Resource for evidence development. N Engl. J. Med. 364, 498–499 (2011).
doi: 10.1056/NEJMp1014427 pubmed: 21226658
Robb, M. A. et al. The US Food and Drug Administration’s Sentinel Initiative: expanding the horizons of medical product safety. Pharmacoepidemiol Drug Saf. 21, 9–11 (2012).
doi: 10.1002/pds.2311 pubmed: 22262587
Platt, R. et al. The FDA Sentinel Initiative — an Evolving National Resource. N Engl. J. Med. 379, 2091–2093 (2018).
doi: 10.1056/NEJMp1809643 pubmed: 30485777
Feng, C., Le, D. & McCoy, A. B. Using Electronic Health Records to identify adverse drug events in Ambulatory Care: a systematic review. Appl. Clin. Inf. 10, 123–128 (2019).
doi: 10.1055/s-0039-1677738
Liu, F., Jagannatha, A. & Yu, H. Towards Drug Safety Surveillance and Pharmacovigilance: current progress in detecting medication and adverse drug events from Electronic Health Records. Drug Saf. 42, 95–97 (2019).
doi: 10.1007/s40264-018-0766-8 pubmed: 30649734 pmcid: 6842570
Fernandes, M. et al. Clinical decision support systems for Triage in the Emergency Department using Intelligent systems: a review. Artif. Intell. Med. 102, 101762 (2020).
doi: 10.1016/j.artmed.2019.101762 pubmed: 31980099
Panahiazar, M., Taslimitehrani, V., Pereira, N. L. & Pathak, J. Using EHRs for heart failure therapy recommendation using Multidimensional Patient Similarity Analytics. Stud. Health Technol. Inf. 210, 369–373 (2015).
Zhang, P., Wang, F., Hu, J. & Sorrentino, R. Towards personalized medicine: leveraging patient similarity and drug similarity analytics. AMIA Jt. Summits Transl. Sci. Proc. AMIA Jt. Summits Transl. Sci. 132–136 (2014). (2014).
Health, C. D. and R. Postmarket Surveillance Under Sect. 522 of the Federal Food, Drug, and Cosmetic Act. U.S. Food and Drug Administration (2022). https://www.fda.gov/regulatory-information/search-fda-guidance-documents/postmarket-surveillance-under-section-522-federal-food-drug-and-cosmetic-act
Alomar, M., Tawfiq, A. M., Hassan, N. & Palaian, S. Post marketing surveillance of suspected adverse drug reactions through spontaneous reporting: current status, challenges and the future. Ther. Adv. Drug Saf. 11, 2042098620938595 (2020).
doi: 10.1177/2042098620938595 pubmed: 32843958 pmcid: 7418468
Bate, A. & Evans, S. J. W. quantitative signal detection using spontaneous ADR reporting. Pharmacoepidemiol Drug Saf. 18, 427–436 (2009).
doi: 10.1002/pds.1742 pubmed: 19358225
Methods | Sentinel Initiative. https://www.sentinelinitiative.org/methods-data-tools/methods
Banerji, A. et al. Natural Language Processing combined with ICD-9-CM codes as a Novel Method to study the epidemiology of allergic drug reactions. J. Allergy Clin. Immunol. Pract. 8, 1032–1038e1 (2020).
doi: 10.1016/j.jaip.2019.12.007 pubmed: 31857264
Bayramli, I. et al. Predictive structured-unstructured interactions in EHR models: a case study of suicide prediction. NPJ Digit. Med. 5, 15 (2022).
doi: 10.1038/s41746-022-00558-0 pubmed: 35087182 pmcid: 8795240
Borjali, A. et al. Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: a case study of detecting total hip replacement dislocation. Comput. Biol. Med. 129, 104140 (2021).
doi: 10.1016/j.compbiomed.2020.104140 pubmed: 33278631
Xie, F. et al. Deep learning for temporal data representation in electronic health records: a systematic review of challenges and methodologies. J. Biomed. Inf. 126, 103980 (2022).
doi: 10.1016/j.jbi.2021.103980
Sun, W., Rumshisky, A. & Uzuner, O. Evaluating temporal relations in clinical text: 2012 i2b2 challenge. J. Am. Med. Inf. Assoc. 20, 806–813 (2013).
doi: 10.1136/amiajnl-2013-001628
Viani, N. et al. A natural language processing approach for identifying temporal disease onset information from mental healthcare text. Sci. Rep. 11, 757 (2021).
doi: 10.1038/s41598-020-80457-0 pubmed: 33436814 pmcid: 7804184
Sheikhalishahi, S. et al. Natural Language Processing of Clinical Notes on Chronic diseases: systematic review. JMIR Med. Inf. 7, e12239 (2019).
doi: 10.2196/12239
Zech, J., Husk, G., Moore, T., Kuperman, G. J. & Shapiro, J. S. Identifying homelessness using health information exchange data. J. Am. Med. Inf. Assoc. JAMIA. 22, 682–687 (2015).
doi: 10.1093/jamia/ocu005
Moore, T. et al. Event detection: a clinical notification service on a health information exchange platform. AMIA Annu. Symp. Proc. AMIA Symp. 2012, 635–642 (2012).
pubmed: 23304336
Bejan, C. A. et al. Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records. J. Am. Med. Inf. Assoc. JAMIA. 25, 61–71 (2018).
doi: 10.1093/jamia/ocx059
Dorr, D. et al. Identifying patients with significant problems related to Social Determinants of Health with Natural Language Processing. Stud. Health Technol. Inf. 264, 1456–1457 (2019).
Desai, R. J. et al. Broadening the reach of the FDA Sentinel system: a roadmap for integrating electronic health record data in a causal analysis framework. NPJ Digit. Med. 4, 170 (2021).
doi: 10.1038/s41746-021-00542-0 pubmed: 34931012 pmcid: 8688411
Carrell, D. S. et al. Improving methods of identifying Anaphylaxis for Medical Product Safety Surveillance using Natural Language Processing and Machine Learning. Am. J. Epidemiol. 192, 283–295 (2023).
doi: 10.1093/aje/kwac182 pubmed: 36331289
Bejan, C. A. et al. Improving ascertainment of suicidal ideation and suicide attempt with natural language processing. Sci. Rep. 12, 15146 (2022).
doi: 10.1038/s41598-022-19358-3 pubmed: 36071081 pmcid: 9452591
Danciu, I. et al. Secondary use of clinical data: the Vanderbilt approach. J. Biomed. Inf. 52, 28–35 (2014).
doi: 10.1016/j.jbi.2014.02.003
Walsh, C. G. et al. Prospective validation of an Electronic Health Record–Based, real-time suicide risk model. JAMA Netw. Open. 4, e211428 (2021).
doi: 10.1001/jamanetworkopen.2021.1428 pubmed: 33710291 pmcid: 7955273
Wilimitis, D. et al. Integration of Face-to-face Screening with Real-time machine learning to Predict risk of suicide among adults. JAMA Netw. Open. 5, e2212095 (2022).
doi: 10.1001/jamanetworkopen.2022.12095 pubmed: 35560048 pmcid: 9107032
The Oxford Handbook of Sleep and Sleep Disorders. (Oxford University Press, doi: (2012). https://doi.org/10.1093/oxfordhb/9780195376203.001.0001
Barkoukis, T. J., Matheson, J. K., Ferber, R. & Doghramji, K. Therapy in Sleep Medicine E-Book (Elsevier Health Sciences, 2011).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. in Advances in Neural Information Processing Systems vol. 26 (Curran Associates, Inc., (2013).
Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Preprint at (2019). https://doi.org/10.48550/arXiv.1810.04805
WHO | International Classification of Diseases. WHO (2017). http://www.who.int/classifications/icd/en/
Swain, R. S. et al. A systematic review of validated suicide outcome classification in observational studies. Int. J. Epidemiol. 48, 1636–1649 (2019).
doi: 10.1093/ije/dyz038 pubmed: 30907424
Embi, P. J. Algorithmovigilance—advancing methods to analyze and monitor Artificial Intelligence–Driven Health Care for Effectiveness and Equity. JAMA Netw. Open. 4, e214622 (2021).
doi: 10.1001/jamanetworkopen.2021.4622 pubmed: 33856479
J. Am. Med. Inform. Assoc. 26, 1645–1650 (2019).
Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. (National Academies, Washington, D.C., doi: (2023). https://doi.org/10.17226/26902
Viani, N. et al. Annotating temporal relations to determine the onset of psychosis symptoms. Stud. Health Technol. Inf. 264, 418–422 (2019).
Ayre, K. et al. Developing a Natural Language Processing tool to identify perinatal self-harm in electronic healthcare records. PloS One. 16, e0253809 (2021).
doi: 10.1371/journal.pone.0253809 pubmed: 34347787 pmcid: 8336818
Fu, J. T., Sholle, E., Krichevsky, S., Scandura, J. & Campion, T. R. Extracting and classifying diagnosis dates from clinical notes: a case study. J. Biomed. Inf. 110, 103569 (2020).
doi: 10.1016/j.jbi.2020.103569
Jin, Y., Li, F., Vimalananda, V. G. & Yu, H. Automatic Detection of Hypoglycemic Events from the Electronic Health Record notes of Diabetes patients: empirical study. JMIR Med. Inf. 7, e14340 (2019).
doi: 10.2196/14340
Cheligeer, C. et al. Validating Large Language Models for Identifying Pathologic Complete Responses After Neoadjuvant Chemotherapy for Breast Cancer Using a Population-Based Pathologic Report Data. Preprint at https://doi.org/ https://doi.org/10.21203/rs.3.rs-4004164/v1 (2024).
Yang, J. et al. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. Patterns 5, (2024).
Elmarakeby, H. A. et al. Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports. BMC Bioinform. 24, 328 (2023).
doi: 10.1186/s12859-023-05439-1
Hays, S. & White, D. J. Employing LLMs for Incident Response Planning and Review. Preprint at (2024). https://doi.org/10.48550/arXiv.2403.01271
Cade, B. E. et al. Sleep apnea phenotyping and relationship to disease in a large clinical biobank. JAMIA Open. 5, ooab117 (2022).
doi: 10.1093/jamiaopen/ooab117 pubmed: 35156000 pmcid: 8826997
Chen, W., Kowatch, R., Lin, S., Splaingard, M. & Huang, Y. Interactive cohort identification of Sleep Disorder patients using Natural Language Processing and i2b2. Appl. Clin. Inf. 6, 345–363 (2015).
doi: 10.4338/ACI-2014-11-RA-0106

Auteurs

Colin G Walsh (CG)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA. Colin.walsh@vumc.org.
Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA. Colin.walsh@vumc.org.
Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA. Colin.walsh@vumc.org.
Vanderbilt University Medical Center, Nashville, USA. Colin.walsh@vumc.org.

Drew Wilimitis (D)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Qingxia Chen (Q)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.

Aileen Wright (A)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Jhansi Kolli (J)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Katelyn Robinson (K)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Michael A Ripperger (MA)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Kevin B Johnson (KB)

Department of Biostatistics, Epidemiology and Informatics, and Pediatrics, University of Pennsylvania, Pennsylvania, USA.
Department of Computer and Information Science, Bioengineering, University of Pennsylvania, Pennsylvania, USA.
Department of Science Communication, University of Pennsylvania, Pennsylvania, USA.

David Carrell (D)

Washington Health Research Institute, , Kaiser Permanente Washington, Washington, USA.

Rishi J Desai (RJ)

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA.

Andrew Mosholder (A)

Center for Drug Evaluation and Research, United States Food and Drug Administration, Maryland, USA.
Office of Surveillance and Epidemiology, United States Food and Drug Administration, Maryland, USA.

Sai Dharmarajan (S)

Center for Drug Evaluation and Research, United States Food and Drug Administration, Maryland, USA.
Office of Translational Science, United States Food and Drug Administration, Maryland, USA.

Sruthi Adimadhyam (S)

Department of Population Medicine, Harvard Medical School, Harvard Pilgrim Health Care Institute, Boston, USA.

Daniel Fabbri (D)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Danijela Stojanovic (D)

Center for Drug Evaluation and Research, United States Food and Drug Administration, Maryland, USA.
Office of Surveillance and Epidemiology, United States Food and Drug Administration, Maryland, USA.

Michael E Matheny (ME)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Cosmin A Bejan (CA)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH