Comparative study on the performance of different classification algorithms, combined with pre- and post-processing techniques to handle imbalanced data, in the diagnosis of adult patients with familial hypercholesterolemia.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2022
2022
Historique:
received:
30
10
2021
accepted:
26
05
2022
entrez:
24
6
2022
pubmed:
25
6
2022
medline:
29
6
2022
Statut:
epublish
Résumé
Familial Hypercholesterolemia (FH) is an inherited disorder of cholesterol metabolism. Current criteria for FH diagnosis, like Simon Broome (SB) criteria, lead to high false positive rates. The aim of this work was to explore alternative classification procedures for FH diagnosis, based on different biological and biochemical indicators. For this purpose, logistic regression (LR), naive Bayes classifier (NB), random forest (RF) and extreme gradient boosting (XGB) algorithms were combined with Synthetic Minority Oversampling Technique (SMOTE), or threshold adjustment by maximizing Youden index (YI), and compared. Data was tested through a 10 × 10 repeated k-fold cross validation design. The LR model presented an overall better performance, as assessed by the areas under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves, and several operating characteristics (OC), regardless of the strategy to cope with class imbalance. When adopting either data processing technique, significantly higher accuracy (Acc), G-mean and F1 score values were found for all classification algorithms, compared to SB criteria (p < 0.01), revealing a more balanced predictive ability for both classes, and higher effectiveness in classifying FH patients. Adjustment of the cut-off values through pre or post-processing methods revealed a considerable gain in sensitivity (Sens) values (p < 0.01). Although the performance of pre and post-processing strategies was similar, SMOTE does not cause model's parameters to loose interpretability. These results suggest a LR model combined with SMOTE can be an optimal approach to be used as a widespread screening tool.
Identifiants
pubmed: 35749402
doi: 10.1371/journal.pone.0269713
pii: PONE-D-21-34690
pmc: PMC9231719
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0269713Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Eur Heart J. 2013 Dec;34(45):3478-90a
pubmed: 23956253
Atherosclerosis. 2005 May;180(1):155-60
pubmed: 15823288
Eur Heart J. 2017 Feb 21;38(8):565-573
pubmed: 27044878
Atherosclerosis. 2013 Jul;229(1):161-8
pubmed: 23669246
J Am Coll Cardiol. 2020 May 26;75(20):2553-2566
pubmed: 32439005
Cardiol Ther. 2015 Jun;4(1):25-38
pubmed: 25769531
J Biomed Sci. 2016 Apr 16;23:39
pubmed: 27084339
NPJ Digit Med. 2019 Apr 11;2:23
pubmed: 31304370
J Clin Endocrinol Metab. 2018 Apr 1;103(4):1704-1714
pubmed: 29408959
Cardiol Clin. 2015 May;33(2):169-79
pubmed: 25939291
BMJ. 1991 Oct 12;303(6807):893-6
pubmed: 1933004
NPJ Digit Med. 2020 Oct 30;3:142
pubmed: 33145438
Am J Epidemiol. 2004 Sep 1;160(5):407-20
pubmed: 15321837
Lancet Public Health. 2019 May;4(5):e256-e264
pubmed: 31054643
Curr Cardiol Rep. 2017 May;19(5):44
pubmed: 28405938
Genet Med. 2015 Dec;17(12):980-8
pubmed: 25741862
J Clin Endocrinol Metab. 2012 Nov;97(11):3956-64
pubmed: 22893714
Atherosclerosis. 2010 Oct;212(2):553-8
pubmed: 20828696
PLoS One. 2014 Jan 09;9(1):e81998
pubmed: 24416135
Biochem Med (Zagreb). 2016 Oct 15;26(3):297-307
pubmed: 27812299
JAMA Netw Open. 2020 Apr 1;3(4):e203959
pubmed: 32347951
Springerplus. 2013 May 14;2(1):222
pubmed: 23853744
Atherosclerosis. 2018 Oct;277:289-297
pubmed: 30270061
Eur Heart J. 2013 Apr;34(13):962-71
pubmed: 23416791
Eur J Prev Cardiol. 2020 Oct;27(15):1639-1646
pubmed: 32019371
Anesth Analg. 2018 May;126(5):1763-1768
pubmed: 29481436
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806
Comput Math Methods Med. 2017;2017:3762651
pubmed: 28642804
Atherosclerosis. 2015 Feb;238(2):336-43
pubmed: 25555265
Nat Rev Dis Primers. 2017 Dec 07;3:17093
pubmed: 29219151
J Clin Epidemiol. 1996 Dec;49(12):1373-9
pubmed: 8970487