Feature extraction for phenotyping from semantic and knowledge resources.

Distributional semantics Electronic health records Machine learning Phenotyping

Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
03 2019
Historique:
pubmed: 11 2 2019
medline: 20 6 2020
entrez: 11 2 2019
Statut: ppublish

Résumé

Phenotyping algorithms can efficiently and accurately identify patients with a specific disease phenotype and construct electronic health records (EHR)-based cohorts for subsequent clinical or genomic studies. Previous studies have introduced unsupervised EHR-based feature selection methods that yielded algorithms with high accuracy. However, those selection methods still require expert intervention to tweak the parameter settings according to the EHR data distribution for each phenotype. To further accelerate the development of phenotyping algorithms, we propose a fully automated and robust unsupervised feature selection method that leverages only publicly available medical knowledge sources, instead of EHR data. SEmantics-Driven Feature Extraction (SEDFE) collects medical concepts from online knowledge sources as candidate features and gives them vector-form distributional semantic representations derived with neural word embedding and the Unified Medical Language System Metathesaurus. A number of features that are semantically closest and that sufficiently characterize the target phenotype are determined by a linear decomposition criterion and are selected for the final classification algorithm. SEDFE was compared with the EHR-based SAFE algorithm and domain experts on feature selection for the classification of five phenotypes including coronary artery disease, rheumatoid arthritis, Crohn's disease, ulcerative colitis, and pediatric pulmonary arterial hypertension using both supervised and unsupervised approaches. Algorithms yielded by SEDFE achieved comparable accuracy to those yielded by SAFE and expert-curated features. SEDFE is also robust to the input semantic vectors. SEDFE attains satisfying performance in unsupervised feature selection for EHR phenotyping. Both fully automated and EHR-independent, this method promises efficiency and accuracy in developing algorithms for high-throughput phenotyping.

Identifiants

pubmed: 30738949
pii: S1532-0464(19)30040-1
doi: 10.1016/j.jbi.2019.103122
pmc: PMC6424621
mid: NIHMS1012269
pii:
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

103122

Subventions

Organisme : NHLBI NIH HHS
ID : L40 HL133929
Pays : United States
Organisme : NIAMS NIH HHS
ID : P30 AR072577
Pays : United States
Organisme : NHLBI NIH HHS
ID : U01 HL121518
Pays : United States
Organisme : NICHD NIH HHS
ID : T32 HD040128
Pays : United States
Organisme : NICHD NIH HHS
ID : K12 HD047349
Pays : United States
Organisme : NIMH NIH HHS
ID : P50 MH106933
Pays : United States

Informations de copyright

Copyright © 2019 Elsevier Inc. All rights reserved.

Références

J Am Med Inform Assoc. 2013 Jun;20(e1):e147-54
pubmed: 23531748
Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:3907-10
pubmed: 21097080
J Biomed Inform. 2015 Dec;58 Suppl:S92-S102
pubmed: 26241355
BMJ. 2015 Apr 24;350:h1885
pubmed: 25911572
PLoS One. 2015 Aug 24;10(8):e0136651
pubmed: 26301417
J Biomed Inform. 2015 Dec;58 Suppl:S143-S149
pubmed: 26305514
BMC Med Inform Decis Mak. 2016 Mar 03;16:30
pubmed: 26940992
J Biomed Inform. 2010 Apr;43(2):240-56
pubmed: 19761870
J Biomed Inform. 2008 Dec;41(6):1070-87
pubmed: 18455483
J Biomed Inform. 2015 Apr;54:329-36
pubmed: 25523466
J Am Med Inform Assoc. 2015 Sep;22(5):993-1000
pubmed: 25929596
Nat Biotechnol. 2013 Dec;31(12):1102-10
pubmed: 24270849
J Am Med Inform Assoc. 2017 Apr 01;24(e1):e143-e149
pubmed: 27632993
BMC Med Genomics. 2011 Jan 26;4:13
pubmed: 21269473
Diabet Med. 2012 Aug;29(8):1029-35
pubmed: 22248043
Arthritis Care Res (Hoboken). 2010 Aug;62(8):1120-7
pubmed: 20235204
Med Care. 2005 May;43(5):480-5
pubmed: 15838413
Nat Rev Genet. 2011 Jun;12(6):417-28
pubmed: 21587298
AMIA Annu Symp Proc. 2010 Nov 13;2010:572-6
pubmed: 21347043
J Pediatr. 2017 Sep;188:224-231.e5
pubmed: 28625502
BMJ. 2011 Apr 06;342:d1642
pubmed: 21471172
Ann Stat. 2009;37(4):1733-1751
pubmed: 20445770
Bioinformatics. 2010 May 1;26(9):1205-10
pubmed: 20335276
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):817-23
pubmed: 22539080
Sci Data. 2014 Sep 16;1:140032
pubmed: 25977789
BMC Genomics. 2008;9 Suppl 1:S10
pubmed: 18366599
J Biomed Inform. 2018 Apr;80:87-95
pubmed: 29530803
J Biomed Inform. 2012 Feb;45(1):129-40
pubmed: 22085698
Circulation. 2013 Apr 2;127(13):1377-85
pubmed: 23463857
J Biomed Inform. 2009 Apr;42(2):390-405
pubmed: 19232399
J Biomed Inform. 2013 Dec;46(6):1088-98
pubmed: 23954592
AMIA Annu Symp Proc. 2014 Nov 14;2014:882-91
pubmed: 25954395
Bioinformatics. 2016 Dec 1;32(23):3635-3644
pubmed: 27531100
Neurology. 1997 Sep;49(3):660-4
pubmed: 9305319
J Biomed Inform. 2016 Apr;60:334-41
pubmed: 26923634
Thromb Res. 2010 Jul;126(1):61-7
pubmed: 20430419
Pac Symp Biocomput. 2020;25:295-306
pubmed: 31797605
J Am Med Inform Assoc. 2016 Jul;23(4):731-40
pubmed: 27107443
Am J Hum Genet. 2011 Oct 7;89(4):529-42
pubmed: 21981779
J Am Med Inform Assoc. 2013 Dec;20(e2):e206-11
pubmed: 24302669
Pharmacoepidemiol Drug Saf. 2013 Apr;22(4):413-22
pubmed: 23436488
AMIA Annu Symp Proc. 2011;2011:274-83
pubmed: 22195079
J Am Med Inform Assoc. 2012 Jun;19(e1):e162-9
pubmed: 22374935
J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173
pubmed: 27174893
J Am Med Inform Assoc. 2007 Jul-Aug;14(4):467-77
pubmed: 17460124
BMC Bioinformatics. 2012 Oct 10;13:261
pubmed: 23046094
Arthritis Rheum. 2013 Mar;65(3):571-81
pubmed: 23233247
BMC Med Inform Decis Mak. 2015;15 Suppl 2:S2
pubmed: 26099735
Inflamm Bowel Dis. 2013 Jun;19(7):1411-20
pubmed: 23567779
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60
pubmed: 29126253
J Biomed Inform. 2014 Dec;52:386-93
pubmed: 25117751
Reprod Biol Endocrinol. 2015 Oct 29;13:116
pubmed: 26510685
AMIA Annu Symp Proc. 2011;2011:189-96
pubmed: 22195070

Auteurs

Wenxin Ning (W)

Department of Industrial Engineering, Tsinghua University, Beijing, China.

Stephanie Chan (S)

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Andrew Beam (A)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Ming Yu (M)

Department of Industrial Engineering, Tsinghua University, Beijing, China.

Alon Geva (A)

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, MA, USA; Department of Anesthesia, Harvard Medical School, Boston, MA, USA.

Katherine Liao (K)

Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Mary Mullen (M)

Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA.

Kenneth D Mandl (KD)

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Isaac Kohane (I)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Tianxi Cai (T)

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Sheng Yu (S)

Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China; Institute for Data Science, Tsinghua University, Beijing, China. Electronic address: syu@tsinghua.edu.cn.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH