Natural language processing to identify lupus nephritis phenotype in electronic health records.

Computational phenotyping Electronic health records Lupus nephritis Natural language processing

Journal

BMC medical informatics and decision making
ISSN: 1472-6947
Titre abrégé: BMC Med Inform Decis Mak
Pays: England
ID NLM: 101088682

Informations de publication

Date de publication:
03 Mar 2024
Historique:
received: 09 04 2021
accepted: 09 01 2024
medline: 4 3 2024
pubmed: 4 3 2024
entrez: 3 3 2024
Statut: epublish

Résumé

Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC). Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm. Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.

Sections du résumé

BACKGROUND BACKGROUND
Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW).
METHODS METHODS
We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC).
RESULTS RESULTS
Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm.
CONCLUSION CONCLUSIONS
Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.

Identifiants

pubmed: 38433189
doi: 10.1186/s12911-024-02420-7
pii: 10.1186/s12911-024-02420-7
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

348

Subventions

Organisme : NIAMS NIH HHS
ID : 5R21AR072262
Pays : United States
Organisme : NIAMS NIH HHS
ID : 1K08 AR072757-01
Pays : United States
Organisme : NIAMS NIH HHS
ID : R61 AR076824
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01HG008672
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01HG008673
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01HG008680
Pays : United States

Informations de copyright

© 2024. The Author(s).

Références

Almaani S, Meara A, Rovin BH. Update on lupus nephritis. Clin J Am Soc Nephrol. 2017;12(5):825–35. https://doi.org/10.2215/CJN.05780616 .
doi: 10.2215/CJN.05780616 pubmed: 27821390
Petri M, et al. Derivation and validation of the systemic lupus international collaborating clinics classification criteria for systemic lupus erythematosus. Arthritis Rheum. 2012;64(8):2677–86. https://doi.org/10.1002/art.34473 .
doi: 10.1002/art.34473 pubmed: 22553077 pmcid: 3409311
Dörner T, Furie R. Novel paradigms in systemic lupus erythematosus. Lancet. 2019;393(10188):2344–58. https://doi.org/10.1016/S0140-6736(19)30546-X .
doi: 10.1016/S0140-6736(19)30546-X pubmed: 31180031
Murphy G, Isenberg DA. New therapies for systemic lupus erythematosus — past imperfect, future tense. Nat Rev Rheumatol. 2019;15(7):403–12. https://doi.org/10.1038/s41584-019-0235-5 .
doi: 10.1038/s41584-019-0235-5 pubmed: 31165780
FDA approves first oral therapy for lupus nephritis. https://www.hcplive.com/view/fda-approves-first-oral-therapy-voclosporin-for-lupus-nephritis . Accessed 23 Jan 2024.
Hochberg MC. Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum. 1997;40(9):1725. https://doi.org/10.1002/art.1780400928 .
doi: 10.1002/art.1780400928 pubmed: 9324032
Aringer M, et al. 2019 European league against rheumatism/American college of rheumatology classification criteria for systemic lupus erythematosus. Ann Rheum Dis. 2019;78(9):1151–9. https://doi.org/10.1136/annrheumdis-2018-214819 .
doi: 10.1136/annrheumdis-2018-214819 pubmed: 31383717
Hoover PJ, Costenbader KH. Insights into the epidemiology and management of lupus nephritis from the US rheumatologist’s perspective. Kidney Int. 2016;90(3):487–92. https://doi.org/10.1016/j.kint.2016.03.042 .
doi: 10.1016/j.kint.2016.03.042 pubmed: 27344205 pmcid: 5679458
Deng Y, Ghamsari F, Lu A, Yu J, Zhao L, Kho AN. Use of real-world evidence data to evaluate the comparative effectiveness of second-line type 2 diabetes medications on chronic kidney disease. J Clin Transl Endocrinol. 2022;30:100309.
pubmed: 36620756 pmcid: 9816064
Deng Y. Advancing computational methods to derive insights from real-world health data. Doctor, Northwestern University, ProQuest Dissertations and Theses database. 2022.
Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(1):139–53. https://doi.org/10.1109/TCBB.2018.2849968 .
doi: 10.1109/TCBB.2018.2849968 pubmed: 29994486
Luo Y, Uzuner O, Szolovits P. Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations. Brief Bioinform. 2017;18(4):722. https://doi.org/10.1093/bib/bbx048 .
doi: 10.1093/bib/bbx048 pubmed: 28472242 pmcid: 6080366
Moores KG, Sathe NA. A systematic review of validated methods for identifying systemic lupus erythematosus (SLE) using administrative or claims data. Vaccine. 2013;31(Suppl 10):K62-73. https://doi.org/10.1016/j.vaccine.2013.06.104 .
doi: 10.1016/j.vaccine.2013.06.104 pubmed: 24331075
Chibnik LB, Massarotti EM, Costenbader KH. Identification and validation of lupus nephritis cases using administrative data. Lupus. 2010;19(6):741–3. https://doi.org/10.1177/0961203309356289 .
doi: 10.1177/0961203309356289 pubmed: 20179167 pmcid: 2964351
Li T, et al. Development and validation of lupus nephritis case definitions using United States veterans affairs electronic health records. Lupus. 2021;30(3):518–26. https://doi.org/10.1177/0961203320973267 .
doi: 10.1177/0961203320973267 pubmed: 33176569
Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350. https://www.bmj.com/content/350/bmj.h1885.full .
Chicago Lupus Database: Systemic Lupus Research Studies: Feinberg School of Medicine: Northwestern University. https://www.lupus.northwestern.edu/research/cld.html . Accessed 23 Jan 2024.
Hochberg MC. Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus. Arthritis and Rheumatism. 1997;40(9):1725.
doi: 10.1002/art.1780400928 pubmed: 9324032
Northwestern Medicine Enterprise Data Warehouse (NMEDW): Research: Feinberg School of Medicine: Northwestern University. https://www.feinberg.northwestern.edu/research/cores/units/edw.html . Accessed 23 Jan 2024.
Rasmussen LV, et al. Design patterns for the development of electronic health record-driven phenotype extraction algorithms. J Biomed Inform. 2014;51:280–6. https://doi.org/10.1016/j.jbi.2014.06.007 .
doi: 10.1016/j.jbi.2014.06.007 pubmed: 24960203
Zhong Y, Rasmussen L, Deng Y, Pacheco J, Smith M, Starren J, et al. Characterizing design patterns of EHR-driven phenotype extraction algorithms. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2018. p. 1143–6. https://ieeexplore.ieee.org/abstract/document/8621240/ .
Zeng Z, et al. Using natural language processing and machine learning to identify breast cancer local recurrence. BMC Bioinformatics. 2018;19(17):65–74.
Zeng Z et al. Identifying breast cancer distant recurrences from electronic health records using machine learning. J Healthc Inform Res. 2019:1–17. https://doi.org/10.1007/s41666-019-00046-3 .
Zeng Z, et al. Contralateral breast cancer event detection using natural language processing. In: AMIA Annual symposium proceedings. American Medical Informatics Association; 2017. p. 1885–92. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5977664/ .
MetaMap - a tool for recognizing UMLS concepts in text. https://www.nlm.nih.gov/research/umls/implementation_resources/metamap.html . Accessed 23 Jan 2024.
Unified Medical Language System (UMLS). https://www.nlm.nih.gov/research/umls/index.html . Accessed 23 Jan 2024.
sklearn.linear_model.Ridge — scikit-learn 0.23.2 documentation.
re — Regular expression operations — Python 3.9.2rc1 documentation.
Vanderbilt University Medical Center. https://www.vumc.org/main/home . Accessed 23 Jan 2024.
Research Data Warehousing | Department of Biomedical Informatics. https://www.vumc.org/dbmi/research-data-warehousing . Accessed 23 Jan 2024.
Luo Y, Szolovits P, Dighe AS, Baron JM. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data. J Am Med Inform Assoc. 2017;25(6):645–53. https://doi.org/10.1093/jamia/ocx133 .
doi: 10.1093/jamia/ocx133 pmcid: 7646951
Luo Y. Evaluating the state of the art in missing data imputation for clinical data. Brief Bioinform. 2021. https://doi.org/10.1093/bib/bbab489 .

Auteurs

Yu Deng (Y)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Jennifer A Pacheco (JA)

Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Anika Ghosh (A)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Anh Chung (A)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Chengsheng Mao (C)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Joshua C Smith (JC)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA.

Juan Zhao (J)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA.

Wei-Qi Wei (WQ)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA.

April Barnado (A)

Department of Medicine, Vanderbilt University Medical Center, Nashville, USA.

Chad Dorn (C)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA.

Chunhua Weng (C)

Department of Biomedical Informatics, Columbia University, New York City, USA.

Cong Liu (C)

Department of Biomedical Informatics, Columbia University, New York City, USA.

Adam Cordon (A)

Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Jingzhi Yu (J)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Yacob Tedla (Y)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Abel Kho (A)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Rosalind Ramsey-Goldman (R)

Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Theresa Walunas (T)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. t-walunas@northwestern.edu.

Yuan Luo (Y)

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. yuan.luo@northwestern.edu.

Classifications MeSH