Feature extraction for phenotyping from semantic and knowledge resources.
Distributional semantics
Electronic health records
Machine learning
Phenotyping
Journal
Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413
Informations de publication
Date de publication:
03 2019
03 2019
Historique:
pubmed:
11
2
2019
medline:
20
6
2020
entrez:
11
2
2019
Statut:
ppublish
Résumé
Phenotyping algorithms can efficiently and accurately identify patients with a specific disease phenotype and construct electronic health records (EHR)-based cohorts for subsequent clinical or genomic studies. Previous studies have introduced unsupervised EHR-based feature selection methods that yielded algorithms with high accuracy. However, those selection methods still require expert intervention to tweak the parameter settings according to the EHR data distribution for each phenotype. To further accelerate the development of phenotyping algorithms, we propose a fully automated and robust unsupervised feature selection method that leverages only publicly available medical knowledge sources, instead of EHR data. SEmantics-Driven Feature Extraction (SEDFE) collects medical concepts from online knowledge sources as candidate features and gives them vector-form distributional semantic representations derived with neural word embedding and the Unified Medical Language System Metathesaurus. A number of features that are semantically closest and that sufficiently characterize the target phenotype are determined by a linear decomposition criterion and are selected for the final classification algorithm. SEDFE was compared with the EHR-based SAFE algorithm and domain experts on feature selection for the classification of five phenotypes including coronary artery disease, rheumatoid arthritis, Crohn's disease, ulcerative colitis, and pediatric pulmonary arterial hypertension using both supervised and unsupervised approaches. Algorithms yielded by SEDFE achieved comparable accuracy to those yielded by SAFE and expert-curated features. SEDFE is also robust to the input semantic vectors. SEDFE attains satisfying performance in unsupervised feature selection for EHR phenotyping. Both fully automated and EHR-independent, this method promises efficiency and accuracy in developing algorithms for high-throughput phenotyping.
Identifiants
pubmed: 30738949
pii: S1532-0464(19)30040-1
doi: 10.1016/j.jbi.2019.103122
pmc: PMC6424621
mid: NIHMS1012269
pii:
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
103122Subventions
Organisme : NHLBI NIH HHS
ID : L40 HL133929
Pays : United States
Organisme : NIAMS NIH HHS
ID : P30 AR072577
Pays : United States
Organisme : NHLBI NIH HHS
ID : U01 HL121518
Pays : United States
Organisme : NICHD NIH HHS
ID : T32 HD040128
Pays : United States
Organisme : NICHD NIH HHS
ID : K12 HD047349
Pays : United States
Organisme : NIMH NIH HHS
ID : P50 MH106933
Pays : United States
Informations de copyright
Copyright © 2019 Elsevier Inc. All rights reserved.
Références
J Am Med Inform Assoc. 2013 Jun;20(e1):e147-54
pubmed: 23531748
Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:3907-10
pubmed: 21097080
J Biomed Inform. 2015 Dec;58 Suppl:S92-S102
pubmed: 26241355
BMJ. 2015 Apr 24;350:h1885
pubmed: 25911572
PLoS One. 2015 Aug 24;10(8):e0136651
pubmed: 26301417
J Biomed Inform. 2015 Dec;58 Suppl:S143-S149
pubmed: 26305514
BMC Med Inform Decis Mak. 2016 Mar 03;16:30
pubmed: 26940992
J Biomed Inform. 2010 Apr;43(2):240-56
pubmed: 19761870
J Biomed Inform. 2008 Dec;41(6):1070-87
pubmed: 18455483
J Biomed Inform. 2015 Apr;54:329-36
pubmed: 25523466
J Am Med Inform Assoc. 2015 Sep;22(5):993-1000
pubmed: 25929596
Nat Biotechnol. 2013 Dec;31(12):1102-10
pubmed: 24270849
J Am Med Inform Assoc. 2017 Apr 01;24(e1):e143-e149
pubmed: 27632993
BMC Med Genomics. 2011 Jan 26;4:13
pubmed: 21269473
Diabet Med. 2012 Aug;29(8):1029-35
pubmed: 22248043
Arthritis Care Res (Hoboken). 2010 Aug;62(8):1120-7
pubmed: 20235204
Med Care. 2005 May;43(5):480-5
pubmed: 15838413
Nat Rev Genet. 2011 Jun;12(6):417-28
pubmed: 21587298
AMIA Annu Symp Proc. 2010 Nov 13;2010:572-6
pubmed: 21347043
J Pediatr. 2017 Sep;188:224-231.e5
pubmed: 28625502
BMJ. 2011 Apr 06;342:d1642
pubmed: 21471172
Ann Stat. 2009;37(4):1733-1751
pubmed: 20445770
Bioinformatics. 2010 May 1;26(9):1205-10
pubmed: 20335276
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):817-23
pubmed: 22539080
Sci Data. 2014 Sep 16;1:140032
pubmed: 25977789
BMC Genomics. 2008;9 Suppl 1:S10
pubmed: 18366599
J Biomed Inform. 2018 Apr;80:87-95
pubmed: 29530803
J Biomed Inform. 2012 Feb;45(1):129-40
pubmed: 22085698
Circulation. 2013 Apr 2;127(13):1377-85
pubmed: 23463857
J Biomed Inform. 2009 Apr;42(2):390-405
pubmed: 19232399
J Biomed Inform. 2013 Dec;46(6):1088-98
pubmed: 23954592
AMIA Annu Symp Proc. 2014 Nov 14;2014:882-91
pubmed: 25954395
Bioinformatics. 2016 Dec 1;32(23):3635-3644
pubmed: 27531100
Neurology. 1997 Sep;49(3):660-4
pubmed: 9305319
J Biomed Inform. 2016 Apr;60:334-41
pubmed: 26923634
Thromb Res. 2010 Jul;126(1):61-7
pubmed: 20430419
Pac Symp Biocomput. 2020;25:295-306
pubmed: 31797605
J Am Med Inform Assoc. 2016 Jul;23(4):731-40
pubmed: 27107443
Am J Hum Genet. 2011 Oct 7;89(4):529-42
pubmed: 21981779
J Am Med Inform Assoc. 2013 Dec;20(e2):e206-11
pubmed: 24302669
Pharmacoepidemiol Drug Saf. 2013 Apr;22(4):413-22
pubmed: 23436488
AMIA Annu Symp Proc. 2011;2011:274-83
pubmed: 22195079
J Am Med Inform Assoc. 2012 Jun;19(e1):e162-9
pubmed: 22374935
J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173
pubmed: 27174893
J Am Med Inform Assoc. 2007 Jul-Aug;14(4):467-77
pubmed: 17460124
BMC Bioinformatics. 2012 Oct 10;13:261
pubmed: 23046094
Arthritis Rheum. 2013 Mar;65(3):571-81
pubmed: 23233247
BMC Med Inform Decis Mak. 2015;15 Suppl 2:S2
pubmed: 26099735
Inflamm Bowel Dis. 2013 Jun;19(7):1411-20
pubmed: 23567779
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60
pubmed: 29126253
J Biomed Inform. 2014 Dec;52:386-93
pubmed: 25117751
Reprod Biol Endocrinol. 2015 Oct 29;13:116
pubmed: 26510685
AMIA Annu Symp Proc. 2011;2011:189-96
pubmed: 22195070