Finding commonalities in rare diseases through the undiagnosed diseases network.

cluster analysis rare diseases supervised machine learning undiagnosed diseases unsupervised machine learning

Journal

Journal of the American Medical Informatics Association : JAMIA
ISSN: 1527-974X
Titre abrégé: J Am Med Inform Assoc
Pays: England
ID NLM: 9430800

Informations de publication

Date de publication:
30 07 2021
Historique:
received: 18 12 2020
accepted: 05 03 2021
pubmed: 20 5 2021
medline: 25 11 2021
entrez: 19 5 2021
Statut: ppublish

Résumé

When studying any specific rare disease, heterogeneity and scarcity of affected individuals has historically hindered investigators from discerning on what to focus to understand and diagnose a disease. New nongenomic methodologies must be developed that identify similarities in seemingly dissimilar conditions. This observational study analyzes 1042 patients from the Undiagnosed Diseases Network (2015-2019), a multicenter, nationwide research study using phenotypic data annotated by specialized staff using Human Phenotype Ontology terms. We used Louvain community detection to cluster patients linked by Jaccard pairwise similarity and 2 support vector classifier to assign new cases. We further validated the clusters' most representative comorbidities using a national claims database (67 million patients). Patients were divided into 2 groups: those with symptom onset before 18 years of age (n = 810) and at 18 years of age or older (n = 232) (average symptom onset age: 10 [interquartile range, 0-14] years). For 810 pediatric patients, we identified 4 statistically significant clusters. Two clusters were characterized by growth disorders, and developmental delay enriched for hypotonia presented a higher likelihood of diagnosis. Support vector classifier showed 0.89 balanced accuracy (0.83 for Human Phenotype Ontology terms only) on test data. To set the framework for future discovery, we chose as our endpoint the successful grouping of patients by phenotypic similarity and provide a classification tool to assign new patients to those clusters. This study shows that despite the scarcity and heterogeneity of patients, we can still find commonalities that can potentially be harnessed to uncover new insights and targets for therapy.

Identifiants

pubmed: 34009343
pii: 6262054
doi: 10.1093/jamia/ocab050
pmc: PMC8324228
doi:

Types de publication

Journal Article Multicenter Study Observational Study Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

1694-1702

Subventions

Organisme : NHGRI NIH HHS
ID : U01 HG007530
Pays : United States

Informations de copyright

© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Références

Electron Physician. 2017 Aug 01;9(8):5107-5112
pubmed: 28979749
Appl Transl Genom. 2016 Mar 10;9:15-9
pubmed: 27354935
Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6
pubmed: 18842627
Arch Neurol. 1968 Sep;19(3):331-8
pubmed: 5698045
JAMA. 2016 Jan 26;315(4):407-8
pubmed: 26813213
Mov Disord. 2015 Apr 15;30(5):614-23
pubmed: 25757427
Genet Med. 2012 Jan;14(1):51-9
pubmed: 22237431
Nucleic Acids Res. 2021 Jan 8;49(D1):D1207-D1217
pubmed: 33264411
Orphanet J Rare Dis. 2018 Jan 8;13(1):2
pubmed: 29310675
J Clin Res Pediatr Endocrinol. 2014;6(1):1-8
pubmed: 24637303
Nat Rev Drug Discov. 2020 Feb;19(2):77-78
pubmed: 32020066
Orphanet J Rare Dis. 2018 Jan 19;13(1):14
pubmed: 29351763
Hum Mutat. 2013 Aug;34(8):1057-65
pubmed: 23636887
Science. 2018 Mar 16;359(6381):1233-1239
pubmed: 29590070
JAMA. 2000 Feb 9;283(6):783-90
pubmed: 10683058
Am J Hum Genet. 2018 Oct 4;103(4):535-552
pubmed: 30290150
J Child Neurol. 2008 Sep;23(9):999-1001
pubmed: 18344458
Mol Genet Metab. 2016 Apr;117(4):393-400
pubmed: 26846157
Am J Hum Genet. 2017 Feb 2;100(2):185-192
pubmed: 28157539
Bioinformatics. 2010 May 1;26(9):1205-10
pubmed: 20335276
JAMA. 2014 Nov 12;312(18):1870-9
pubmed: 25326635
J Psychiatr Res. 2014 Dec;59:179-88
pubmed: 25263276
Nucleic Acids Res. 2017 Jan 4;45(D1):D865-D876
pubmed: 27899602
JAMA. 2014 Nov 12;312(18):1880-7
pubmed: 25326637
N Engl J Med. 2018 Nov 29;379(22):2131-2139
pubmed: 30304647
Curr Med Chem. 2018 Jan 30;25(3):404-432
pubmed: 28721829
Am J Med Genet B Neuropsychiatr Genet. 2018 Oct;177(7):613-624
pubmed: 28862395
Lancet Neurol. 2011 Feb;10(2):109
pubmed: 21256450
Dis Model Mech. 2012 Jan;5(1):3-5
pubmed: 22228787
Orphanet J Rare Dis. 2017 Apr 11;12(1):68
pubmed: 28399928
Mol Genet Metab Rep. 2016 Aug 02;8:67-73
pubmed: 27536552
Nature. 2009 Oct 8;461(7265):747-53
pubmed: 19812666
J Clin Epidemiol. 2008 Apr;61(4):324-30
pubmed: 18313556
Sci Rep. 2012;2:336
pubmed: 22468223
Genet Res (Camb). 2015 Sep 14;97:e15
pubmed: 26365496

Auteurs

Josephine Yates (J)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Alba Gutiérrez-Sacristán (A)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Vianney Jouhet (V)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Kimberly LeBlanc (K)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Cecilia Esteves (C)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Thomas N DeSain (TN)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Nick Benik (N)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Jason Stedman (J)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Nathan Palmer (N)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Guillaume Mellon (G)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Isaac Kohane (I)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Paul Avillach (P)

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH