Machine learning-based classification of valvular heart disease using cardiovascular risk factors.
Bioinformatics
Cardiovascular
Machine learning
Majority voting
Risk factors
Valvular heart disease
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
17 Oct 2024
17 Oct 2024
Historique:
received:
06
11
2023
accepted:
18
07
2024
medline:
18
10
2024
pubmed:
18
10
2024
entrez:
17
10
2024
Statut:
epublish
Résumé
Valvular Heart Disease (VHD) is a globally significant cause of mortality, particularly among aging populations. Despite advancements in percutaneous and surgical interventions, there are still uncertainties that remain regarding the risk factors that significantly contribute to this condition within the domain of cardiovascular disease. This study investigates these uncertainties and the role of machine learning in categorizing VHD based on cardiovascular risk factors. It follows a two-part investigation comprising feature extraction and classification phases. Feature extraction is initially performed using a wrapping approach and refined further with binary logistic regression. The second phase employs five classifiers: Artificial Neural Network (ANN), XGBoost, Random Forest (RF), Naïve Bayes, and Support Vector Machine (SVM), along with advanced methods such as SVM combined with Principal Component Analysis (PCA) and a majority-voting ensemble method (MV5). Data on VHD cases were collected from DHQ Hospital Faisalabad using simple random sampling. Various statistical measures, such as the ROC curve, F-measure, sensitivity, specificity, accuracy, MCC, and Kappa are applied to assess the results. The findings reveal that the combination of SVM with PCA achieves the highest overall performance while the MV5 ensemble method also demonstrates high accuracy and balance in sensitivity and specificity. The variation in VHD prevalence linked to specific risk factors highlights the importance of a comprehensive approach to reduce this disease's burden. The Exceptional performance of SVM + PCA and MV5 highlights their significance in diagnosing VHD and advancing knowledge in biomedicine.
Identifiants
pubmed: 39420025
doi: 10.1038/s41598-024-67973-z
pii: 10.1038/s41598-024-67973-z
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
24396Informations de copyright
© 2024. The Author(s).
Références
Roth, G. A. et al. Global, regional, and national burden of cardiovascular diseases for 10 causes, 1990 to 2015. J. Am. Coll. Cardiol. 70(1), 1–25 (2017).
pubmed: 28527533
doi: 10.1016/j.jacc.2017.04.052
Ruan, Y. et al. Cardiovascular disease (CVD) and associated risk factors among older adults in six low-and middle-income countries: Results from SAGE Wave 1. BMC Public Health 18(1), 1–13. https://doi.org/10.1186/s12889-018-5653-9 (2018).
doi: 10.1186/s12889-018-5653-9
W. H. Organization. Waist Circumference and Waist-Hip Ratio: Report of a WHO Expert Consultation, Geneva, 8–11 December 2008 (World Health Organization, 2011).
Ostchega, Y., Fryar, C. D., Nwankwo, T. & Nguyen, D. T. Hypertension prevalence among adults aged 18 and over: United States, 2017–2018. NCHS Data Brief 364, 1–8 (2020).
Commodore-Mensah, Y. et al. Proceedings from a national heart, lung, and blood institute and the centers for disease control and prevention workshop to control hypertension. Am. J. Hypertens. 35(3), 232–243 (2022).
pubmed: 35259237
doi: 10.1093/ajh/hpab182
Kirkland, E. B. et al. Trends in healthcare expenditures among US adults with hypertension: National estimates, 2003–2014. J. Am. Heart Assoc. 7(11), e008731 (2018).
pubmed: 29848493
doi: 10.1161/JAHA.118.008731
Gaziano, T. A., Bitton, A., Anand, S., Abrahams-Gessel, S. & Murphy, A. Growing epidemic of coronary heart disease in low-and middle-income countries. Curr. Probl. Cardiol. 35(2), 72–115 (2010).
pubmed: 20109979
doi: 10.1016/j.cpcardiol.2009.10.002
Yusuf, S. et al. Cardiovascular risk and events in 17 low-, middle-, and high-income countries. N. Engl. J. Med. 371(9), 818–827 (2014).
pubmed: 25162888
doi: 10.1056/NEJMoa1311890
Teo, K. K. & Dokainish, H. The emerging epidemic of cardiovascular risk factors and atherosclerotic disease in developing countries. Can. J. Cardiol. 33(3), 358–365 (2017).
pubmed: 28232018
doi: 10.1016/j.cjca.2016.12.014
Prince, M. J. et al. The burden of disease in older people and implications for health policy and practice. Lancet 385(9967), 549–562 (2015).
pubmed: 25468153
doi: 10.1016/S0140-6736(14)61347-7
Zhou, Y., Jin, Y. & Zhang, Z. Short-term exposure to various ambient air pollutants and emergency department visits for cause-stable ischemic heart disease: A time-series study in Shanghai, China. Sci. Rep. 13(1), 16989 (2023).
pubmed: 37813933
doi: 10.1038/s41598-023-44321-1
Gotta, V., Tancev, G., Marsenic, O., Vogt, J. E. & Pfister, M. Identifying key predictors of mortality in young patients on chronic haemodialysis—a machine learning approach. Nephrol. Dial. Transplant. 36(3), 519–528 (2021).
pubmed: 32510143
doi: 10.1093/ndt/gfaa128
Miyashita, Y. et al. Predicting heart failure onset in the general population using a novel data-mining artificial intelligence method. Sci. Rep. 13(1), 4352 (2023).
pubmed: 36928666
doi: 10.1038/s41598-023-31600-0
Shouman, M., Turner, T. & Stocker, R. Using decision tree for diagnosing heart disease patients. AusDM 11, 23–30 (2011).
Ali, L. et al. An automated diagnostic system for heart disease prediction based on ${\chi^{2}} $ statistical model and optimally configured deep neural network. IEEE Access 7, 34938–34945 (2019).
doi: 10.1109/ACCESS.2019.2904800
Javeed, A. et al. An intelligent learning system based on random search algorithm and optimized random forest model for improved heart disease detection. IEEE Access 7, 180235–180243 (2019).
doi: 10.1109/ACCESS.2019.2952107
Abdar, M. et al. A new machine learning technique for an accurate diagnosis of coronary artery disease. Comput. Methods Programs Biomed. 179, 104992 (2019).
pubmed: 31443858
doi: 10.1016/j.cmpb.2019.104992
Alizadehsani, R. et al. Exerting cost-sensitive and feature creation algorithms for coronary artery disease diagnosis. Int. J. Knowl. Discov. Bioinform. (IJKDB) 3(1), 59–79 (2012).
doi: 10.4018/jkdb.2012010104
Arabasadi, Z., Alizadehsani, R., Roshanzamir, M., Moosaei, H. & Yarifard, A. A. Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Comput. Methods Programs Biomed. 141, 19–26 (2017).
pubmed: 28241964
doi: 10.1016/j.cmpb.2017.01.004
Alizadehsani, R. et al. Diagnosis of coronary artery disease using data mining techniques based on symptoms and ECG features. Eur. J. Sci. Res. 82(4), 542–553 (2012).
Yao, Q., Zhang, L., Zheng, W., Zhou, Y. & Xiao, Y. Multi-scale SE-residual network with transformer encoder for myocardial infarction classification. Appl. Soft Comput. 149, 110919 (2023).
doi: 10.1016/j.asoc.2023.110919
Rath, A., Mishra, D. & Panda, G. Imbalanced ECG signal-based heart disease classification using ensemble machine learning technique. Front. Big Data https://doi.org/10.3389/fdata.2022.1021518 (2022).
doi: 10.3389/fdata.2022.1021518
pubmed: 36299660
Corsi, D. J. et al. Prospective urban rural epidemiology (PURE) study: Baseline characteristics of the household sample and comparative analyses with national data in 17 countries. Am. Heart J. 166(4), 636-646. e4 (2013).
pubmed: 24093842
doi: 10.1016/j.ahj.2013.04.019
Stanaway, J. D. et al. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet 392(10159), 1923–1994 (2018).
doi: 10.1016/S0140-6736(18)32225-6
N. J. D. a. f. t. m. c. o. d. f. Cdc, "Underlying cause of death 1999–2013 on CDC WONDER online database, released 2015," vol. 2013, 1999.
Prabhakaran, D. et al. The changing patterns of cardiovascular diseases and their risk factors in the states of India: The Global Burden of Disease Study 1990–2016. Lancet Global Health 6(12), e1339–e1351 (2018).
doi: 10.1016/S2214-109X(18)30407-8
Coffey, S. et al. Global epidemiology of valvular heart disease. Nat. Rev. Cardiol. 18(12), 853–864 (2021).
pubmed: 34172950
doi: 10.1038/s41569-021-00570-z
Maganti, K., Rigolin, V. H., Sarano, M. E. & Bonow, R. O. Valvular heart disease: Diagnosis and management. Mayo Clin. Proc. 85(5), 483–500 (2010).
pubmed: 20435842
doi: 10.4065/mcp.2009.0706
Iung, B. & Vahanian, A. Epidemiology of acquired valvular heart disease. Can. J. Cardiol. 30(9), 962–970 (2014).
pubmed: 24986049
doi: 10.1016/j.cjca.2014.03.022
Nishimura, R. A. et al. ACC/AHA 2008 guideline update on valvular heart disease: Focused update on infective endocarditis: A report of the American college of cardiology/American heart association task force on practice guidelines: Endorsed by the society of cardiovascular anesthesiologists, society for cardiovascular angiography and interventions, and society of thoracic surgeons. Circulation 118(8), 887–896 (2008).
pubmed: 18663090
doi: 10.1161/CIRCULATIONAHA.108.190377
Mozaffarian, D. et al. Heart disease and stroke statistics—2015 update: A report from the American Heart Association. Circulation 131(4), 29–322 (2015).
Garcia-Palmieri, M. R., Costas, R. Jr., Cruz-Vidal, M., Sorlie, P. D. & Havlik, R. J. Increased physical activity: A protective factor against heart attacks in Puerto Rico. Am. J. Cardiol. 50(4), 749–755 (1982).
pubmed: 7124632
doi: 10.1016/0002-9149(82)91229-2
Manley, A. F. Cardiovascular implications of smoking: The surgeon general’s point of view. J. Health Care Poor Underserved 8(3), 303–310 (1997).
pubmed: 9253222
doi: 10.1353/hpu.2010.0517
Flint, A. C. et al. Effect of systolic and diastolic blood pressure on cardiovascular outcomes. N. Engl. J. Med. 381(3), 243–251 (2019).
pubmed: 31314968
doi: 10.1056/NEJMoa1803180
W. H. Organization. Obesity: Preventing and Managing the Global Epidemic (World Health Organization, 2000).
W. E. J. L. Consultation. Appropriate body-mass index for Asian populations and its implications for policy and intervention strategies. Lancet 363(9403), 157–163 (2004).
doi: 10.1016/S0140-6736(03)15268-3
W. H. Organization. Definition and Diagnosis of Diabetes Mellitus and Intermediate Hyperglycaemia: Report of a WHO/IDF Consultation (World Health Organization, 2006).
Bashir, S., Usman, Q., Khan, F. H. & Javed, M. Y. MV5: A clinical decision support framework for heart disease prediction using majority vote based classifier ensemble. Arab. J. Sci. Eng. 39, 7771–7783 (2014).
doi: 10.1007/s13369-014-1315-0
Bashir, S., Khan, Z. S., Khan, F. H., Anjum, A. & Bashir, K. Improving heart disease prediction using feature selection approaches. In 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST) (eds Bashir, S. et al.) 619–623 (IEEE, 2014).
Ali, F. et al. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf. Fusion 63, 208–222 (2020).
doi: 10.1016/j.inffus.2020.06.008
Y. Nie, L. De Santis, M. Carratù, M. O’Nils, P. Sommella, and J. Lundgren, "Deep melanoma classification with K-fold cross-validation for process optimization," 2020: IEEE, pp. 1–6.
K. Li, W. Zhang, Q. Lu, and X. Fang, "An improved SMOTE imbalanced data classification method based on support degree," 2014: IEEE, pp. 34–38.
S. Mokeddem, B. Atmani, and M. Mokaddem, "Supervised feature selection for diagnosis of coronary artery disease based on genetic algorithm," arXiv preprint arXiv:1305.6046 , (2013).
Kohavi, R. & John, G. H. Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997).
doi: 10.1016/S0004-3702(97)00043-X
Wilson, J. R. & Lorenz, K. A. Short history of the logistic regression model. In Modeling Binary Correlated Responses using SAS, SPSS and R (eds Wilson, J. R. & Lorenz, K. A.) 17–23 (Springer, 2015).
doi: 10.1007/978-3-319-23805-0_2
Cramer, J. S. The Origins of Logistic Regression (Tinbergen Institute Working Paper, 2002).
Y. Bengio, "Continuous optimization of hyper-parameters," 2000, vol. 1: IEEE, pp. 305–310.
B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers," In: Proceedings of the fifth annual workshop on Computational learning theory, 1992, pp. 144–152.
Hastie, T., Tibshirani, R. & Friedman, J. Unsupervised learning. In The Elements of Statistical Learning (eds Hastie, T. et al.) 485–585 (Springer, 2009).
doi: 10.1007/978-0-387-84858-7_14
Ben-Hur, A. & Weston, J. A user’s guide to support vector machines. In Data Mining Techniques for the Life Sciences (eds Ben-Hur, A. & Weston, J.) 223–239 (Springer, 2010).
doi: 10.1007/978-1-60327-241-4_13
Hussain, S., Songhua, X., Aslam, M. U. & Hussain, F. Clinical predictions of COVID-19 patients using deep stacking neural networks. J. Investig. Med. 72(1), 112–127 (2024).
pubmed: 37712431
doi: 10.1177/10815589231201103
Raj, V., Renjini, A., Swapna, M. S., Sreejyothi, S. & Sankararaman, S. Nonlinear time series and principal component analyses: Potential diagnostic tools for COVID-19 auscultation. Chaos Solitons Fractals 140, 110246 (2020).
pubmed: 32863618
doi: 10.1016/j.chaos.2020.110246
Mahmoudi, M. R., Heydari, M. H., Qasem, S. N., Mosavi, A. & Band, S. S. Principal component analysis to study the relations between the spread rates of COVID-19 in high risks countries. Alex. Eng. J. 60(1), 457–464 (2021).
doi: 10.1016/j.aej.2020.09.013
Hussain, S., Songhua, X., Aslam, M. U., Hussain, F. & Ali, I. Optimal Prognostic accuracy: Machine learning approaches for COVID-19 prognosis with biomarkers and demographic information. New Gener. Comput. https://doi.org/10.1007/s00354-024-00261-6 (2024).
doi: 10.1007/s00354-024-00261-6
L. Breiman, 2001. Random Forests (Machine Learning). Netherlands: Kluwer Academic,
T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
Adeola Ogunleye, Q.-G.W. XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinf. 17(6), 2131–2140 (2020).
doi: 10.1109/TCBB.2019.2911071
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Pattekari, S. A. & Parveen, A. Prediction system for heart disease using Naïve Bayes. Int. J. Adv. Comput. Math. Sci. 3(3), 290–294 (2012).
Zhang, H. "The optimality of Naïve Bayes." American Association for Artificial Intelligence, ed, 1.2 (2004): 3.
Mogotsi, I. C. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval: Cambridge University Press, Cambridge, England, 2008, 482 pp, ISBN: 978-0-521-86571-5 (Springer, 2010).
J. Huang, J. Lu, and C. X. Ling, "Comparing naive Bayes, decision trees, and SVM with AUC and accuracy," 2003 2003: IEEE, pp. 553–556.
S. Palaniappan and R. Awang, "Intelligent heart disease prediction system using data mining techniques," 2008: IEEE, pp. 108–115.
T. Karayılan and Ö. Kılıç, "Prediction of heart disease using neural network," 2017: IEEE, pp. 719–723.
Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine Learning (Springer, 2006).
Witten, I. H. & Frank, E. J. A. S. R. Data mining: Practical machine learning tools and techniques with Java implementations. ACM SIGMOD Record 31(1), 76–77 (2002).
doi: 10.1145/507338.507355
Zhang, R., McAllister, G., Scotney, B., McClean, S. & Houston, G. Combining wavelet analysis and Bayesian networks for the classification of auditory brainstem response. IEEE Trans. Inf. Technol. Biomed. 10(3), 458–467 (2006).
pubmed: 16871712
doi: 10.1109/TITB.2005.863865
Maimon, O. Z. & Rokach, L. Data Mining with Decision Trees: Theory and Applications (World Scientific, 2014).
Hanley, J. A. & McNeil, B. J. J. R. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982).
pubmed: 7063747
doi: 10.1148/radiology.143.1.7063747
Jekova, I., Bortolan, G. & Christov, I. Assessment and comparison of different methods for heartbeat classification. Med. Eng. Phys. 30(2), 248–257 (2008).
pubmed: 17382573
doi: 10.1016/j.medengphy.2007.02.003
Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. et Biophys. Acta (BBA) Protein Struct. 405(2), 442–451 (1975).
doi: 10.1016/0005-2795(75)90109-9
Salzberg, S. L. On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Min. Knowl. Discov. 1, 317–328 (1997).
doi: 10.1023/A:1009752403260
Roth, G. A. et al. Global and regional patterns in cardiovascular mortality from 1990 to 2013. Circulation 132(17), 1667–1678 (2015).
pubmed: 26503749
doi: 10.1161/CIRCULATIONAHA.114.008720
Naghavi, M. et al. Global, regional, and national age-sex specific mortality for 264 causes of death, 1980–2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet 390(10100), 1151–1210 (2017).
doi: 10.1016/S0140-6736(17)32152-9
Yusuf, S. et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): Case-control study. Lancet 364(9438), 937–952 (2004).
pubmed: 15364185
doi: 10.1016/S0140-6736(04)17018-9
Dritsas, E. & Trigka, M. Efficient data-driven machine learning models for cardiovascular diseases risk prediction. Sensors 23(3), 1161 (2023).
pubmed: 36772201
pmcid: 9921621
doi: 10.3390/s23031161
Shouman, M., Turner, T. & Stocker, R. Integrating clustering with different data mining techniques in the diagnosis of heart disease. J. Comput. Sci. Eng 20(1), 1–10 (2013).
S. Ghumbre, C. Patil, and A. Ghatol, "Heart disease diagnosis using support vector machine," in International conference on computer science and information technology (ICCSIT’) Pattaya, 2011, pp. 84–88.
Bashir, S., Qamar, U. & Khan, F. H. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework. J. Biomed. Inf. 59, 185–200 (2016).
doi: 10.1016/j.jbi.2015.12.001
Alizadehsani, R. et al. Diagnosis of coronary artery disease using data mining techniques based on symptoms and ECG features. Eur. J. Sci. Res. 82, 542–553 (2012).
F. Babič, J. Olejár, Z. Vantová, and J. Paralič, "Predictive and descriptive analysis for heart disease diagnosis," 2017: IEEE, pp. 155–163.