Identifying undetected dementia in UK primary care patients: a retrospective case-control study comparing machine-learning and standard epidemiological approaches.
Aged
Algorithms
Bayes Theorem
Case-Control Studies
Computational Biology
Dementia
/ diagnosis
Electronic Health Records
Female
Humans
Logistic Models
Machine Learning
Male
Neural Networks, Computer
Primary Health Care
Retrospective Studies
Risk Assessment
State Medicine
Support Vector Machine
United Kingdom
Dementia
Diagnosis
Early detection
Electronic health records
General practice
Machine learning
Prediction
Primary care
Journal
BMC medical informatics and decision making
ISSN: 1472-6947
Titre abrégé: BMC Med Inform Decis Mak
Pays: England
ID NLM: 101088682
Informations de publication
Date de publication:
02 12 2019
02 12 2019
Historique:
received:
05
06
2019
accepted:
21
11
2019
entrez:
4
12
2019
pubmed:
4
12
2019
medline:
14
4
2020
Statut:
epublish
Résumé
Identifying dementia early in time, using real world data, is a public health challenge. As only two-thirds of people with dementia now ultimately receive a formal diagnosis in United Kingdom health systems and many receive it late in the disease process, there is ample room for improvement. The policy of the UK government and National Health Service (NHS) is to increase rates of timely dementia diagnosis. We used data from general practice (GP) patient records to create a machine-learning model to identify patients who have or who are developing dementia, but are currently undetected as having the condition by the GP. We used electronic patient records from Clinical Practice Research Datalink (CPRD). Using a case-control design, we selected patients aged >65y with a diagnosis of dementia (cases) and matched them 1:1 by sex and age to patients with no evidence of dementia (controls). We developed a list of 70 clinical entities related to the onset of dementia and recorded in the 5 years before diagnosis. After creating binary features, we trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, support vector machines, random forest and neural networks). We examined the most important features contributing to discrimination. The final analysis included data on 93,120 patients, with a median age of 82.6 years; 64.8% were female. The naïve Bayes model performed least well. The logistic regression, support vector machine, neural network and random forest performed very similarly with an AUROC of 0.74. The top features retained in the logistic regression model were disorientation and wandering, behaviour change, schizophrenia, self-neglect, and difficulty managing. Our model could aid GPs or health service planners with the early detection of dementia. Future work could improve the model by exploring the longitudinal nature of patient data and modelling decline in function over time.
Sections du résumé
BACKGROUND
Identifying dementia early in time, using real world data, is a public health challenge. As only two-thirds of people with dementia now ultimately receive a formal diagnosis in United Kingdom health systems and many receive it late in the disease process, there is ample room for improvement. The policy of the UK government and National Health Service (NHS) is to increase rates of timely dementia diagnosis. We used data from general practice (GP) patient records to create a machine-learning model to identify patients who have or who are developing dementia, but are currently undetected as having the condition by the GP.
METHODS
We used electronic patient records from Clinical Practice Research Datalink (CPRD). Using a case-control design, we selected patients aged >65y with a diagnosis of dementia (cases) and matched them 1:1 by sex and age to patients with no evidence of dementia (controls). We developed a list of 70 clinical entities related to the onset of dementia and recorded in the 5 years before diagnosis. After creating binary features, we trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, support vector machines, random forest and neural networks). We examined the most important features contributing to discrimination.
RESULTS
The final analysis included data on 93,120 patients, with a median age of 82.6 years; 64.8% were female. The naïve Bayes model performed least well. The logistic regression, support vector machine, neural network and random forest performed very similarly with an AUROC of 0.74. The top features retained in the logistic regression model were disorientation and wandering, behaviour change, schizophrenia, self-neglect, and difficulty managing.
CONCLUSIONS
Our model could aid GPs or health service planners with the early detection of dementia. Future work could improve the model by exploring the longitudinal nature of patient data and modelling decline in function over time.
Identifiants
pubmed: 31791325
doi: 10.1186/s12911-019-0991-9
pii: 10.1186/s12911-019-0991-9
pmc: PMC6889642
doi:
Types de publication
Comparative Study
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
248Subventions
Organisme : Wellcome Trust (GB)
ID : 202133/Z/16/Z
Pays : International
Références
BMJ Open. 2013 Dec 23;3(12):e004023
pubmed: 24366579
Ther Adv Drug Saf. 2012 Apr;3(2):89-99
pubmed: 25083228
J Thorac Oncol. 2010 Sep;5(9):1315-6
pubmed: 20736804
Nat Rev Neurol. 2010 Jun;6(6):318-26
pubmed: 20498679
BMJ. 1990 Apr 28;300(6732):1092
pubmed: 2344534
Int J Geriatr Psychiatry. 2008 Jul;23(7):663-9
pubmed: 18229882
BMC Med Inform Decis Mak. 2009 Jan 21;9:6
pubmed: 19159458
Aging Ment Health. 2011 Nov;15(8):978-84
pubmed: 21777080
PLoS One. 2015 Sep 03;10(9):e0136181
pubmed: 26334524
J Alzheimers Dis. 2014;42 Suppl 4:S329-38
pubmed: 25261451
Curr Probl Pediatr Adolesc Health Care. 2011 Mar;41(3):60-88
pubmed: 21315295
Int J Epidemiol. 2015 Jun;44(3):827-36
pubmed: 26050254
Fam Pract. 2007 Apr;24(2):108-16
pubmed: 17237496
BJGP Open. 2018 Jun 13;2(2):bjgpopen18X101589
pubmed: 30564722
Patient Educ Couns. 2000 Feb;39(2-3):219-25
pubmed: 11040721
BMJ. 2010 Aug 05;341:c3584
pubmed: 20688840
PLoS One. 2018 Mar 29;13(3):e0194735
pubmed: 29596471
Am J Geriatr Psychiatry. 2009 Nov;17(11):965-75
pubmed: 20104054
BMC Med. 2016 Jan 21;14:6
pubmed: 26797096
Dement Geriatr Cogn Disord. 2007;24(4):300-6
pubmed: 17717417
PLoS One. 2011 Feb 18;6(2):e16852
pubmed: 21364746
Arch Med Res. 2012 Nov;43(8):705-9
pubmed: 23085453
Curr Opin Psychiatry. 2016 Mar;29(2):174-80
pubmed: 26779863
Ther Adv Drug Saf. 2019 May 31;10:2042098619854010
pubmed: 31210923