An integrative machine learning framework for classifying SEER breast cancer.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
01 04 2023
01 04 2023
Historique:
received:
23
01
2023
accepted:
21
03
2023
medline:
4
4
2023
entrez:
3
4
2023
pubmed:
4
4
2023
Statut:
epublish
Résumé
Breast cancer is the commonest type of cancer in women worldwide and the leading cause of mortality for females. The aim of this research is to classify the alive and death status of breast cancer patients using the Surveillance, Epidemiology, and End Results dataset. Due to its capacity to handle enormous data sets systematically, machine learning and deep learning has been widely employed in biomedical research to answer diverse classification difficulties. Pre-processing the data enables its visualization and analysis for use in making important decisions. This research presents a feasible machine learning-based approach for categorizing SEER breast cancer dataset. Moreover, a two-step feature selection method based on Variance Threshold and Principal Component Analysis was employed to select the features from the SEER breast cancer dataset. After selecting the features, the classification of the breast cancer dataset is carried out using Supervised and Ensemble learning techniques such as Ada Boosting, XG Boosting, Gradient Boosting, Naive Bayes and Decision Tree. Utilizing the train-test split and k-fold cross-validation approaches, the performance of various machine learning algorithms is examined. The accuracy of Decision Tree for both train-test split and cross validation achieved as 98%. In this study, it is observed that the Decision Tree algorithm outperforms other supervised and ensemble learning approaches for the SEER Breast Cancer dataset.
Identifiants
pubmed: 37005484
doi: 10.1038/s41598-023-32029-1
pii: 10.1038/s41598-023-32029-1
pmc: PMC10067827
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
5362Informations de copyright
© 2023. The Author(s).
Références
CA Cancer J Clin. 2019 Mar;69(2):127-157
pubmed: 30720861
Gene. 2019 Apr 15;692:170-175
pubmed: 30641215
Genom Data. 2015 May 23;5:46-50
pubmed: 26484222
Cancers (Basel). 2023 Jan 22;15(3):
pubmed: 36765642
J Imaging. 2021 Oct 26;7(11):
pubmed: 34821856
Gene. 2018 May 30;657:50-59
pubmed: 29501620
Photodiagnosis Photodyn Ther. 2023 Mar;41:103284
pubmed: 36646366
Breast. 2022 Dec;66:15-23
pubmed: 36084384
Eur J Surg Oncol. 2022 Dec;48(12):2385-2392
pubmed: 35922281
Talanta. 2023 Mar 1;254:123858
pubmed: 36470017
Funct Integr Genomics. 2023 Jan 14;23(1):38
pubmed: 36640225
Front Genet. 2020 Sep 09;11:566057
pubmed: 33033496
Sci Rep. 2023 Jan 26;13(1):1452
pubmed: 36702877
Sci Rep. 2022 Sep 12;12(1):15331
pubmed: 36097024
Med Image Anal. 2019 Feb;52:185-198
pubmed: 30594771
Sci Rep. 2017 Aug 18;7(1):8833
pubmed: 28821841