Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population.
Decision Tree
Diabetes Mellitus
Logistic Models
Neural Network
Statistical Models
Journal
Healthcare informatics research
ISSN: 2093-3681
Titre abrégé: Healthc Inform Res
Pays: Korea (South)
ID NLM: 101534553
Informations de publication
Date de publication:
Jan 2022
Jan 2022
Historique:
received:
25
08
2020
accepted:
11
08
2021
entrez:
16
2
2022
pubmed:
17
2
2022
medline:
17
2
2022
Statut:
ppublish
Résumé
This study developed and compared the performance of three widely used predictive models-logistic regression (LR), artificial neural network (ANN), and decision tree (DT)-to predict diabetes mellitus using the socio-demographic, lifestyle, and physical attributes of a population of Nigerians. We developed three predictive models using 10 input variables. Data preprocessing steps included the removal of missing values and outliers, min-max normalization, and feature extraction using principal component analysis. Data training and validation were accomplished using 10-fold cross-validation. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUROC) were used as performance evaluation metrics. Analysis and model development were performed in R version 3.6.1. The mean age of the participants was 50.52 ± 16.14 years. The classification accuracy, sensitivity, specificity, PPV, and NPV for LR were, respectively, 81.31%, 84.32%, 77.24%, 72.75%, and 82.49%. Those for ANN were 98.64%, 98.37%, 99.00%, 98.61%, and 98.83%, and those for DT were 99.05%, 99.76%, 98.08%, 98.77%, and 99.82%, respectively. The best-performing and poorest-performing classifiers were DT and LR, with 99.05% and 81.31% accuracy, respectively. Similarly, the DT algorithm achieved the best AUC value (0.992) compared to ANN (0.976) and LR (0.892). Our study demonstrated that DT, LR, and ANN models can be used effectively for the prediction of diabetes mellitus in the Nigerian population based on certain risk factors. An overall comparative analysis of the models showed that the DT model performed better than LR and ANN.
Identifiants
pubmed: 35172091
pii: hir.2022.28.1.58
doi: 10.4258/hir.2022.28.1.58
pmc: PMC8850175
doi:
Types de publication
Journal Article
Langues
eng
Pagination
58-67Subventions
Organisme : FIC NIH HHS
ID : D43 TW010134
Pays : United States
Organisme : Fogarty International Center of the National Institutes of Health
ID : D43TW010134
Références
Endocrinol Metab Clin North Am. 2014 Mar;43(1):103-22
pubmed: 24582094
East Mediterr Health J. 2010 Jun;16(6):615-20
pubmed: 20799588
BMC Endocr Disord. 2019 Oct 15;19(1):101
pubmed: 31615566
J Biomed Inform. 2014 Apr;48:193-204
pubmed: 24582925
Kaohsiung J Med Sci. 2013 Feb;29(2):93-9
pubmed: 23347811
Am J Med Sci. 2013 Apr;345(4):271-273
pubmed: 23531957
Diabetes Care. 2013 Feb;36(2):383-93
pubmed: 22966089
BMJ. 2012 Sep 18;345:e5900
pubmed: 22990994
Sci Rep. 2018 Oct 29;8(1):15958
pubmed: 30374195
Diabetes Nutr Metab. 2002 Aug;15(4):215-21
pubmed: 12416658
Glob J Health Sci. 2015 Mar 18;7(5):304-10
pubmed: 26156928
Lancet. 2016 Apr 9;387(10027):1513-1530
pubmed: 27061677
Diabetes Ther. 2018 Jun;9(3):1307-1316
pubmed: 29761289
Diabetes Care. 2011 Jan;34(1):244-6
pubmed: 21193623
Indian J Endocrinol Metab. 2013 Jul;17(4):653-8
pubmed: 23961481
Chin Med J (Engl). 2012 Mar;125(5):851-7
pubmed: 22490586
Diabetes Res Clin Pract. 2013 Apr;100(1):111-8
pubmed: 23453177