Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework.

artificial intelligence big data cohort identification electronic health records feature selection phenotyping rheumatology text mining transparent machine learning

Journal

Diagnostics (Basel, Switzerland)
ISSN: 2075-4418
Titre abrégé: Diagnostics (Basel)
Pays: Switzerland
ID NLM: 101658402

Informations de publication

Date de publication:
15 Oct 2021
Historique:
received: 16 09 2021
revised: 06 10 2021
accepted: 13 10 2021
entrez: 23 10 2021
pubmed: 24 10 2021
medline: 24 10 2021
Statut: epublish

Résumé

(1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process.

Identifiants

pubmed: 34679609
pii: diagnostics11101908
doi: 10.3390/diagnostics11101908
pmc: PMC8534858
pii:
doi:

Types de publication

Journal Article

Langues

eng

Subventions

Organisme : Medical Research Council
ID : MR/S004084/1
Pays : United Kingdom
Organisme : Health Data Research UK
ID : NIWA1
Organisme : Major Project of National Social Science Foundation of China
ID : 16ZDA0092

Références

Arthritis Rheum. 2005 Nov;52(11):3360-70
pubmed: 16255010
J Biomed Inform. 2016 Dec;64:168-178
pubmed: 27744022
BMC Med Inform Decis Mak. 2009 Jan 16;9:3
pubmed: 19149883
Postgrad Med J. 2014 Jan;90(1059):13-7
pubmed: 24225940
Arthritis Rheum. 2005 Apr;52(4):1000-8
pubmed: 15818678
JAMA. 2016 Jul 5;316(1):63-9
pubmed: 27380344
BMJ Open. 2015 Dec 23;5(12):e009309
pubmed: 26700281
Ann Rheum Dis. 2008 Jul;67(7):955-9
pubmed: 17962239
J Biomed Inform. 2012 Jun;45(3):447-59
pubmed: 22265814
NPJ Digit Med. 2021 Apr 7;4(1):65
pubmed: 33828217
PLoS One. 2012;7(12):e51468
pubmed: 23272108
Br J Gen Pract. 2016 Mar;66(644):e152-7
pubmed: 26917656
BioData Min. 2021 Jan 22;14(1):6
pubmed: 33482874
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):221-30
pubmed: 24201027
Trends Mol Med. 2021 Oct;27(10):1014-1015
pubmed: 34312074
Arch Dis Child. 2013 Mar;98(3):195-202
pubmed: 23343522
Arthritis Rheum. 2008 Sep 15;59(9):1314-21
pubmed: 18759262
BMC Health Serv Res. 2009 Sep 04;9:157
pubmed: 19732426
Arthritis Res Ther. 2011 Feb 23;13(1):R32
pubmed: 21345216

Auteurs

Fabiola Fernández-Gutiérrez (F)

Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK.

Jonathan I Kennedy (JI)

Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK.

Roxanne Cooksey (R)

Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK.

Mark Atkinson (M)

Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK.

Ernest Choy (E)

Arthritis Research UK CREATE Centre, Division Infection and Immunity, Cardiff University, Cardiff CF10 3NB, UK.
Welsh Arthritis Research Network, School of Medicine, Cardiff University, Cardiff CF10 3NB, UK.

Sinead Brophy (S)

Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK.

Lin Huo (L)

China-ASEAN Research Institute, Guangxi University, Nanning 530004, China.

Shang-Ming Zhou (SM)

Centre for Health Technology, Faculty of Health, University of Plymouth, Plymouth PL4 8AA, UK.

Classifications MeSH