Leveraging multi-site electronic health data for characterization of subtypes: a pilot study of dementia in the N3C Clinical Tenant.

comorbidity patterns dementia subtypes electronic health records machine learning algorithms multi-institutional studies

Journal

JAMIA open
ISSN: 2574-2531
Titre abrégé: JAMIA Open
Pays: United States
ID NLM: 101730643

Informations de publication

Date de publication:
Oct 2024
Historique:
received: 17 05 2024
revised: 19 07 2024
accepted: 01 08 2024
medline: 12 8 2024
pubmed: 12 8 2024
entrez: 12 8 2024
Statut: epublish

Résumé

To provide a foundational methodology for differentiating comorbidity patterns in subphenotypes through investigation of a multi-site dementia patient dataset. Employing the National Clinical Cohort Collaborative Tenant Pilot (N3C Clinical) dataset, our approach integrates machine learning algorithms-logistic regression and eXtreme Gradient Boosting (XGBoost)-with a diagnostic hierarchical model for nuanced classification of dementia subtypes based on comorbidities and gender. The methodology is enhanced by multi-site EHR data, implementing a hybrid sampling strategy combining 65% Synthetic Minority Over-sampling Technique (SMOTE), 35% Random Under-Sampling (RUS), and Tomek Links for class imbalance. The hierarchical model further refines the analysis, allowing for layered understanding of disease patterns. The study identified significant comorbidity patterns associated with diagnosis of Alzheimer's, Vascular, and Lewy Body dementia subtypes. The classification models achieved accuracies up to 69% for Alzheimer's/Vascular dementia and highlighted challenges in distinguishing Dementia with Lewy Bodies. The hierarchical model elucidates the complexity of diagnosing Dementia with Lewy Bodies and reveals the potential impact of regional clinical practices on dementia classification. Our methodology underscores the importance of leveraging multi-site datasets and tailored sampling techniques for dementia research. This framework holds promise for extending to other disease subtypes, offering a pathway to more nuanced and generalizable insights into dementia and its complex interplay with comorbid conditions. This study underscores the critical role of multi-site data analyzes in understanding the relationship between comorbidities and disease subtypes. By utilizing diverse healthcare data, we emphasize the need to consider site-specific differences in clinical practices and patient demographics. Despite challenges like class imbalance and variability in EHR data, our findings highlight the essential contribution of multi-site data to developing accurate and generalizable models for disease classification.

Identifiants

pubmed: 39132679
doi: 10.1093/jamiaopen/ooae076
pii: ooae076
pmc: PMC11316614
doi:

Types de publication

Journal Article

Langues

eng

Pagination

ooae076

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Déclaration de conflit d'intérêts

None declared.

Auteurs

Suchetha Sharma (S)

School of Data Science, University of Virginia, Charlottesville, VA 22903, United States.

Jiebei Liu (J)

Department of Systems Engineering, University of Virginia, Charlottesville, VA 22904, United States.

Amy Caroline Abramowitz (AC)

Department of Psychiatry, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, United States.

Carol Reynolds Geary (CR)

Department of Pathology, Microbiology & Immunology, University of Nebraska Medical Center, Omaha, NE 68198-5900, United States.

Karen C Johnston (KC)

Department of Neurology, University of Virginia, Charlottesville, VA 22903, United States.

Carol Manning (C)

Department of Neurology, University of Virginia, Charlottesville, VA 22903, United States.

John Darrell Van Horn (JD)

School of Data Science, University of Virginia, Charlottesville, VA 22903, United States.

Andrea Zhou (A)

School of Medicine, University of Virginia, Charlottesville, VA 22903, United States.

Alfred J Anzalone (AJ)

Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE 68198, United States.

Johanna Loomba (J)

School of Medicine, University of Virginia, Charlottesville, VA 22903, United States.

Emily Pfaff (E)

Department of Medicine, North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.

Don Brown (D)

School of Data Science, Co-Director integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, Charlottesville, VA 22903, United States.

Classifications MeSH