Deep representation learning of electronic health records to unlock patient stratification at scale.

Data processing Machine learning

Journal

NPJ digital medicine
ISSN: 2398-6352
Titre abrégé: NPJ Digit Med
Pays: England
ID NLM: 101731738

Informations de publication

Date de publication:
2020
Historique:
received: 09 04 2020
accepted: 17 06 2020
entrez: 24 7 2020
pubmed: 24 7 2020
medline: 24 7 2020
Statut: epublish

Résumé

Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks, and autoencoders (i.e., ConvAE) to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. ConvAE significantly outperformed several baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease, and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. With these results, we demonstrate that ConvAE can generate patient representations that lead to clinically meaningful insights. This scalable framework can help better understand varying etiologies in heterogeneous sub-populations and unlock patterns for EHR-based research in the realm of personalized medicine.

Identifiants

pubmed: 32699826
doi: 10.1038/s41746-020-0301-z
pii: 301
pmc: PMC7367859
doi:

Types de publication

Journal Article

Langues

eng

Pagination

96

Informations de copyright

© The Author(s) 2020.

Déclaration de conflit d'intérêts

Competing interestsThe authors declare no competing interests.

Références

Diabetologia. 2019 Jul;62(7):1107-1112
pubmed: 31161345
J Neurol Neurosurg Psychiatry. 2013 Oct;84(10):1126-37
pubmed: 23378642
J Am Med Inform Assoc. 2018 Oct 1;25(10):1419-1428
pubmed: 29893864
Mov Disord. 2018 Nov;33(11):1712-1723
pubmed: 30264539
J Biomed Semantics. 2012 Apr 24;3 Suppl 1:S5
pubmed: 22541596
AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:41-50
pubmed: 27570647
PLoS One. 2018 Oct 2;13(10):e0204627
pubmed: 30278063
J Am Med Inform Assoc. 2016 Jul;23(4):731-40
pubmed: 27107443
PLoS One. 2013 Oct 11;8(10):e76295
pubmed: 24146854
Sci Transl Med. 2015 Oct 28;7(311):311ra174
pubmed: 26511511
Alzheimers Dement. 2015 Jun;11(6):710-7
pubmed: 25510382
Semin Oncol. 2016 Dec;43(6):676-681
pubmed: 28061985
Lancet Neurol. 2006 Jun;5(6):525-35
pubmed: 16713924
JMLR Workshop Conf Proc. 2016 Aug;56:301-318
pubmed: 28286600
Pediatrics. 2014 Jan;133(1):e54-63
pubmed: 24323995
Biomed Res Int. 2014;2014:232546
pubmed: 25101266
J Biomed Inform. 2016 Dec;64:168-178
pubmed: 27744022
Dialogues Clin Neurosci. 2009;11(2):111-28
pubmed: 19585947
Ann Neurol. 2006 Apr;59(4):591-6
pubmed: 16566021
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
JAMA. 1980 Feb 22-29;243(8):756-62
pubmed: 6986000
Sci Rep. 2016 Oct 18;6:35333
pubmed: 27752054
Arch Neurol. 2006 Aug;63(8):1100-4
pubmed: 16908735
Sci Rep. 2016 May 17;6:26094
pubmed: 27185194
IEEE J Biomed Health Inform. 2017 Jan;21(1):22-30
pubmed: 27913366
Compr Physiol. 2011 Jul;1(3):1175-232
pubmed: 23733640
Summit Transl Bioinform. 2009 Mar 01;2009:56-60
pubmed: 21347171
JAMA. 2002 Sep 25;288(12):1475-83
pubmed: 12243634
Nat Rev Genet. 2015 Jan;16(1):45-56
pubmed: 25404111
J Am Med Inform Assoc. 2016 Apr;23(e1):e20-7
pubmed: 26338219
NPJ Digit Med. 2019 May 30;2:43
pubmed: 31304389
Annu Rev Biomed Data Sci. 2018 Jul;1:53-68
pubmed: 31218278
Brief Bioinform. 2018 Nov 27;19(6):1236-1246
pubmed: 28481991
J Am Med Inform Assoc. 2016 Nov;23(6):1046-1052
pubmed: 27026615
Neurology. 2004 Nov 23;63(10):1908-11
pubmed: 15557510
Pac Symp Biocomput. 2018;23:145-156
pubmed: 29218877
IEEE Trans Nanobioscience. 2018 Jul;17(3):219-227
pubmed: 29994534
Crit Rev Oncol Hematol. 2010 Apr;74(1):40-60
pubmed: 19577481
Nat Rev Genet. 2012 May 02;13(6):395-405
pubmed: 22549152
Cochrane Database Syst Rev. 2018 Jun 18;6:CD001190
pubmed: 29923184
Sci Rep. 2019 Jan 28;9(1):797
pubmed: 30692568
Nat Biotechnol. 2016 Aug;34(8):838-44
pubmed: 27376585
Nat Rev Drug Discov. 2018 Mar;17(3):183-196
pubmed: 29217837
Diabet Med. 1997 Jan;14(1):29-34
pubmed: 9017350
Int J Med Inform. 2019 Sep;129:29-36
pubmed: 31445269
NPJ Digit Med. 2018 May 8;1:18
pubmed: 31304302
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828
pubmed: 23787338

Auteurs

Isotta Landi (I)

Bruno Kessler Institute, Povo, TN Italy.
Department of Psychology and Cognitive Science, University of Trento, Rovereto, TN Italy.

Benjamin S Glicksberg (BS)

Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA.

Hao-Chih Lee (HC)

Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA.

Sarah Cherng (S)

Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA.

Giulia Landi (G)

Department of Mental Health and Pathological Addiction, Azienda USL Centro "Santi", Parma, Italy.

Matteo Danieletto (M)

Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA.

Joel T Dudley (JT)

Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA.

Cesare Furlanello (C)

Bruno Kessler Institute, Povo, TN Italy.
HK3 Lab, Milan, Italy.

Riccardo Miotto (R)

Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA.

Classifications MeSH