Gradient boosting decision tree becomes more reliable than logistic regression in predicting probability for diabetes with big data.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
11 10 2022
11 10 2022
Historique:
received:
23
03
2022
accepted:
09
09
2022
entrez:
11
10
2022
pubmed:
12
10
2022
medline:
14
10
2022
Statut:
epublish
Résumé
We sought to verify the reliability of machine learning (ML) in developing diabetes prediction models by utilizing big data. To this end, we compared the reliability of gradient boosting decision tree (GBDT) and logistic regression (LR) models using data obtained from the Kokuho-database of the Osaka prefecture, Japan. To develop the models, we focused on 16 predictors from health checkup data from April 2013 to December 2014. A total of 277,651 eligible participants were studied. The prediction models were developed using a light gradient boosting machine (LightGBM), which is an effective GBDT implementation algorithm, and LR. Their reliabilities were measured based on expected calibration error (ECE), negative log-likelihood (Logloss), and reliability diagrams. Similarly, their classification accuracies were measured in the area under the curve (AUC). We further analyzed their reliabilities while changing the sample size for training. Among the 277,651 participants, 15,900 (7978 males and 7922 females) were newly diagnosed with diabetes within 3 years. LightGBM (LR) achieved an ECE of 0.0018 ± 0.00033 (0.0048 ± 0.00058), a Logloss of 0.167 ± 0.00062 (0.172 ± 0.00090), and an AUC of 0.844 ± 0.0025 (0.826 ± 0.0035). From sample size analysis, the reliability of LightGBM became higher than LR when the sample size increased more than [Formula: see text]. Thus, we confirmed that GBDT provides a more reliable model than that of LR in the development of diabetes prediction models using big data. ML could potentially produce a highly reliable diabetes prediction model, a helpful tool for improving lifestyle and preventing diabetes.
Identifiants
pubmed: 36220875
doi: 10.1038/s41598-022-20149-z
pii: 10.1038/s41598-022-20149-z
pmc: PMC9553945
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
15889Commentaires et corrections
Type : ErratumIn
Informations de copyright
© 2022. The Author(s).
Références
Int J Med Inform. 2020 Nov;143:104268
pubmed: 32950874
JAMA. 2018 Apr 3;319(13):1317-1318
pubmed: 29532063
Diabetol Int. 2020 Jul 24;11(3):165-223
pubmed: 32802702
J Clin Epidemiol. 2019 Jun;110:12-22
pubmed: 30763612
Epidemiol Rev. 2011;33:46-62
pubmed: 21622851
Ann Intern Med. 2015 Jan 6;162(1):W1-73
pubmed: 25560730
Int J Endocrinol Metab. 2021 Mar 22;19(3):e109206
pubmed: 34567135
J Clin Epidemiol. 2020 Jun;122:56-69
pubmed: 32169597
BMC Med. 2019 Dec 16;17(1):230
pubmed: 31842878
Diabetes Res Clin Pract. 2022 Jan;183:109119
pubmed: 34879977
Diabetes Res Clin Pract. 2013 Apr;100(1):111-8
pubmed: 23453177
Comput Struct Biotechnol J. 2017 Jan 08;15:104-116
pubmed: 28138367
J Diabetes Sci Technol. 2015 Oct 14;10(1):27-34
pubmed: 26468133
BMC Med. 2011 Sep 08;9:103
pubmed: 21902820
Ann Intern Med. 2013 Oct 15;159(8):543-51
pubmed: 24126648
BMJ. 2011 Nov 28;343:d7163
pubmed: 22123912
Lancet Oncol. 2019 May;20(5):e262-e273
pubmed: 31044724
Sci Rep. 2020 Jul 20;10(1):11981
pubmed: 32686721
N Engl J Med. 2016 Sep 29;375(13):1216-9
pubmed: 27682033
BMC Med Res Methodol. 2014 Dec 22;14:137
pubmed: 25532820
Stat Med. 2013 Jan 15;32(1):67-80
pubmed: 22833304