Breast cancer prediction using different machine learning methods applying multi factors.


Journal

Journal of cancer research and clinical oncology
ISSN: 1432-1335
Titre abrégé: J Cancer Res Clin Oncol
Pays: Germany
ID NLM: 7902060

Informations de publication

Date de publication:
Dec 2023
Historique:
received: 08 07 2023
accepted: 01 09 2023
medline: 20 11 2023
pubmed: 29 9 2023
entrez: 29 9 2023
Statut: ppublish

Résumé

Breast cancer (BC) is a multifactorial disease and is one of the most common cancers globally. This study aimed to compare different machine learning (ML) techniques to develop a comprehensive breast cancer risk prediction model based on features of various factors. The population sample contained 810 records (115 cancer patients and 695 healthy individuals). 45 attributes out of 85 were selected based on the opinion of experts. These selected attributes are in genetic, biochemical, biomarker, gender, demographic and pathological factors. 13 Machine learning models were trained with proposed attributes and coefficient of attributes and internal relationships were calculated. Compared to other methods random forest (RF) has higher performance (accuracy 99.26%, precision 99%, and area under the curve (AUC) 99%). The results of assessing the impact and correlation of variables using the RF method based on PCA indicated that pathology, biomarker, biochemistry, gene, and demographic factors with a coefficient of 0.35, 0.23, 0.15, 0.14, and 0.13 respectively, affected the risk of BC (r Breast cancer has several risk factors. Medical experts use these risk factors for early diagnosis. Therefore, identifying related risk factors and their effect can increase the accuracy of diagnosis. Considering the broad features for predicting breast cancer leads to the development of a comprehensive prediction model. In this study, using RF technique a breast cancer prediction model with 99.3% accuracy was developed based on multifactorial features.

Identifiants

pubmed: 37773467
doi: 10.1007/s00432-023-05388-5
pii: 10.1007/s00432-023-05388-5
doi:

Substances chimiques

Biomarkers 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

17133-17146

Informations de copyright

© 2023. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Références

Abdel-Zaher AM, Eldeib AM (2016) Breast cancer classification using deep belief networks. Expert Syst Appl 46:139–144
Akbari A et al (2011) Parity and breastfeeding are preventive measures against breast cancer in Iranian women. Breast Cancer 18:51–55
pubmed: 20217489
Antoniou AC, Easton D (2006) Models of genetic susceptibility to breast cancer. Oncogene 25:5898–5905
pubmed: 16998504
Arthur RS, Xue X, Rohan TE (2020) Prediagnostic circulating levels of sex steroid hormones and SHBG in relation to risk of ductal carcinoma in situ of the breast among UK women. Cancer Epidemiol Prev Biomark 29:1058–1066
Awaysheh A et al (2019) Review of medical decision support and machine-learning methods. Vet Pathol 56:512–525
pubmed: 30866728
Bazila-Banu A, Thirumalaikolundusubramanian P (2018) Comparison of Bayes classifiers for breast cancer classification. Asian Pac J Cancer Prev: APJCP 19:2917
Bharati S, Rahman MA, Podder P (2018) In: 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT). IEEE. pp 581–584
Boeri C et al (2020) Machine Learning techniques in breast cancer prognosis prediction: a primary evaluation. Cancer Med 9:3234–3243
pubmed: 32154669
Borges C, Almeida D, Damasceno M (2020) Prognostic and predictive factors for primary chemotherapy in locally advanced breast cancer. medRxiv
Brewer HR, Jones ME, Schoemaker MJ, Ashworth A, Swerdlow AJ (2017) Family history and risk of breast cancer: an analysis accounting for family structure. Breast Cancer Res Treat 165:193–200
pubmed: 28578505
Calle ML, Urrea V, Boulesteix A-L, Malats N (2011) AUC-RF: a new strategy for genomic profiling with random forest. Hum Hered 72:121–132
pubmed: 21996641
Chandrasekar R, Palaniammal V, Phil M (2013) Performance and evaluation of data mining techniques in cancer diagnosis. IOSR J Comput Eng (IOSR-JCE) 15:39–44
Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99:323–329
Chen X, Wang M, Zhang H (2011) The use of classification trees for bioinformatics. Wiley Interdiscip Rev: Data Min Knowl Discov 1:55–63
pubmed: 22523608
Chen W et al (2013) Risk of GWAS-identified genetic variants for breast cancer in a Chinese population: a multiple interaction analysis. Breast Cancer Res Treat 142:637–644
pubmed: 24265035
Chen L et al (2020) Local extraction and detection of early stage breast cancers through a microneedle and nano-Ag/MBL film based painless and blood-free strategy. Mater Sci Eng, C 109:110402
Chidambaranathan S (2016) Breast cancer diagnosis based on feature extraction by hybrid of k-means and extreme learning machine algorithms. ARPN J Eng Appl Sci 11:4581–4586
Chu SY et al (1991) The relationship between body mass and breast cancer among women enrolled in the cancer and steroid hormone study. J Clin Epidemiol 44:1197–1206
pubmed: 1941014
Dorani F, Hu T, Woods MO, Zhai G (2018) Ensemble learning for detecting gene-gene interactions in colorectal cancer. PeerJ 6:e5854
pubmed: 30397551
Eltalhi S, Kutrani H (2019) Breast cancer diagnosis and prediction using machine learning and data mining techniques: a review. IOSR J Dental Med Sci 18(4):85–94
Emerson M (2019) Race, age and treatment delay in the Carolina breast cancer study phase 3
Fabris VT (2014) From chromosomal abnormalities to the identification of target genes in mouse models of breast cancer. Cancer Genet 207:233–246
pubmed: 25176624
Ferguson NL et al (2013) Prognostic value of breast cancer subtypes, Ki-67 proliferation index, age, and pathologic tumor characteristics on breast cancer survival in Caucasian women. Breast J 19:22–30
pubmed: 23240985
Ferroni P et al (2019) Breast cancer prognosis using a machine learning approach. Cancers 11:328
pubmed: 30866535
Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK (2019) Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak 19:48
pubmed: 30902088
Garcia M et al (2007) Global Cancer Facts & Figures 2007. Atlanta, GA: American Cancer Society
Getachew S et al (2020) Perceived barriers to early diagnosis of breast cancer in south and southwestern Ethiopia: a qualitative study. BMC Womens Health 20:1–8
Giger ML (2000) Computer-aided diagnosis in mammography. Handb Med Imaging 2:915–1004
Hadizadeh M et al (2018) GJA4/Connexin 37 mutations correlate with secondary lymphedema following surgery in breast cancer patients. Biomedicines 6:23
pubmed: 29470392
Hayes SC, Janda M, Cornish B, Battistutta D, Newman B (2008) Lymphedema after breast cancer: incidence, risk factors, and effect on upper body function. J Clin Oncol 26:3536–3542
pubmed: 18640935
Hesari A et al (2019) Evaluation of the two polymorphisms rs1801133 in MTHFR and rs10811661 in CDKN2A/B in breast cancer. J Cell Biochem 120:2090–2097
pubmed: 30362613
Ho PJ et al (2020) Incidence of breast cancer attributable to breast density, modifiable and non-modifiable breast cancer risk factors in Singapore. Sci Rep 10:1–11
Kim W et al (2012) Development of novel breast cancer recurrence prediction model using support vector machine. J Breast Cancer 15:230–238
pubmed: 22807942
Knai C et al (2012) Systematic review of the methodological quality of clinical guideline development for the management of chronic disease in Europe. Health Policy 107:157–167
pubmed: 22795610
Kobayashi H, Takahashi H, Kimura T, Kikuchi K, Tazaki M (2000) In: 2000 26th annual conference of the IEEE industrial electronics society. IECON 2000. 2000 ieee international conference on industrial electronics, control and instrumentation. 21st century technologies. IEEE, pp. 487–492
Kontzoglou K et al (2013) Correlation between Ki67 and breast cancer prognosis. Oncology 84:219–225
pubmed: 23364275
Kordík P, Černý J, Frýda T (2018) Discovering predictive ensembles for transfer learning and meta-learning. Mach Learn 107:177–207
Lavanya D, Rani KU (2012) Ensemble decision tree classifier for breast cancer data. Int J Inf Technol Converg Serv 2:17
Liang M et al (2018) Association between CHEK2* 1100delC and breast cancer: a systematic review and meta-analysis. Mol Diagn Ther 22:397–407
pubmed: 29909568
Liu K-H, Tong M, Xie S-T, Yee Ng VT (2015) Genetic programming based ensemble system for microarray data classification. Comput Math Methods Med. https://doi.org/10.1155/2015/193406
doi: 10.1155/2015/193406 pubmed: 26788119
Lotfi M, Charkhati S, Shobeyri S (2008) Breast cancer risk factors in an urban area of Yazd city, Iran
Ma R, Huang D, Zhang T, Luo T (2018) Determining influential descriptors for polymer chain conformation based on empirical force-fields and molecular dynamics simulations. Chem Phys Lett 704:49–54
Majali J, Niranjan R, Phatak V, Tadakhe O (2015) Data mining techniques for diagnosis and prognosis of cancer. Int J Adv Res Comput Commun Eng 4:613–616
Martin A-M, Weber BL (2000) Genetic and hormonal risk factors in breast cancer. J Natl Cancer Inst 92:1126–1135
pubmed: 10904085
Menze BH et al (2009) A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10:213
pubmed: 19591666
Moore P, Lyons T, Gallacher J, Initiative AsDN (2019) Random forest prediction of Alzheimer’s disease using pairwise selection from time series data. PLoS ONE 14:e0211558
pubmed: 30763336
Mubarik S et al (2020) A Hierarchical age–period–cohort analysis of breast cancer mortality and disability adjusted life years (1990–2015) attributable to modified risk factors among Chinese women. Int J Environ Res Public Health 17:1367
pubmed: 32093283
Mushtaq Z, Yaqub A, Sani S, Khalid A (2020) Effective K-nearest neighbor classifications for Wisconsin breast cancer data sets. J Chin Inst Eng 43:80–92
Nazari E, Ameli E, Tabesh H (2019a) Big data in healthcare: A to Z. J Biostat Epidemiol 5(3):194–203
Nazari E, Afkanpour M, Tabesh H (2019b) Big data from A to Z. Front Health Inform 8:20
Nazari E et al (2020a) Deep learning for acute myeloid leukemia diagnosis. J Med Life 13:382
pubmed: 33072212
Nazari E et al (2020b) A comprehensive overview of decision fusion technique in healthcare: a systematic scoping review. Iran Red Crescent Med J 22(10):e30
Nguyen C, Wang Y, Nguyen HN (2013) Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J Biomed Sci Eng 6(3):551–560
Okun O, Priisalu H (2007) Iberian conference on pattern recognition and image analysis. Springer, pp. 483–490
Ozcift A (2012) SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. J Med Syst 36:2141–2147
pubmed: 21547504
Polat K, Güneş S (2007) Breast cancer diagnosis using least square support vector machine. Digit Signal Process 17:694–701
Pujol P, Galtier-Dereure F, Bringer J (1997) Obesity and breast cancer risk. Hum Reprod 12:116–125
pubmed: 9403328
Qi Y (2012) Ensemble machine learning. Springer, New York, pp 307–323
Radhakrishnan A, Madhav ML (2016) A survey on efficient broadcast protocol for the Internet of Things. IJECS 5:18838–18842
Reddington R et al (2020) Incidence of male breast cancer in Scotland over a twenty-five-year period (1992–2017). Eur J Surg Oncol 46(6):e51
Sarica A, Cerasa A, Quattrone A (2017) Random Forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Front Aging Neurosci 9:329
pubmed: 29056906
Sartor H et al (2020) The association of single nucleotide polymorphisms (SNPs) with breast density and breast cancer survival: the Malmö diet and cancer study. Acta Radiol 61(10):1326–1334
pubmed: 32036684
Saslow D et al (2007) American Cancer Society guidelines for breast screening with MRI as an adjunct to mammography. CA: Cancer J Clin 57:75–89
pubmed: 17392385
Seifi S et al (2020) Association of cyclin-dependent kinase inhibitor 2A/B with increased risk of developing breast cancer. J Cell Physiol 235:5141–5145
pubmed: 31721206
Semin JN, Palm D, Smith LM, Ruttle S (2020) Understanding breast cancer survivors’ financial burden and distress after financial assistance. Support Care Cancer 28(9):4241–4248
pubmed: 31900619
Setiono R (2000) Generating concise and accurate classification rules for breast cancer diagnosis. Artif Intell Med 18:205–219
pubmed: 10675715
ShahidSales S et al (2018) A genetic variant in CDKN2A/B gene is associated with the increased risk of breast cancer. J Clin Lab Anal 32:e22190
pubmed: 28276595
Sheikhtaheri A, Sadoughi F, Dehaghi ZH (2014) Developing and using expert systems and neural networks in medicine: a review on benefits and challenges. J Med Syst 38:110
pubmed: 25027017
Shen T-C et al (2017) Patients with uterine leiomyoma exhibit a high incidence but low mortality rate for breast cancer. Oncotarget 8:33014
pubmed: 28380432
Smith-Warner SA et al (1998) Alcohol and breast cancer in women: a pooled analysis of cohort studies. JAMA 279:535–540
pubmed: 9480365
Sumbaly R, Vishnusri N, Jeyalatha S (2014) Diagnosis of breast cancer using decision tree data mining technique. Int J Comput Appl 98(10):16–24
Takalkar U et al (2020) Hormone related risk factors and breast cancer: hospital based case control study from India. Breast Cancer. https://doi.org/10.5171/2014.872124
doi: 10.5171/2014.872124
Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinform 2(3 Suppl):S75-83
Tarek A, El-Ghonaimy EA, Abdelaziz S, El-Shinawi M, Mohamed MM (2020) Characterization of the surgical leakage collected after breast cancer surgery and studying their effect on breast cancer cell line. Egypt Acad J Biol Sci, D Histol Histochem 12:21–29
Tourassi GD, Markey MK, Lo JY, Floyd CE Jr (2001) A neural network approach to breast cancer diagnosis as a constraint satisfaction problem. Med Phys 28:804–811
pubmed: 11393476
Übeyli ED (2007) Implementing automated diagnostic systems for breast cancer detection. Expert Syst Appl 33:1054–1062
Wang H et al (2020) Competitive electrochemical aptasensor based on a cDNA-ferrocene/MXene probe for detection of breast cancer marker Mucin1. Anal Chim Acta 1094:18–25
pubmed: 31761044
Yue W et al (2010) Effects of estrogen on breast cancer development: role of estrogen receptor independent mechanisms. Int J Cancer 127:1748–1757
pubmed: 20104523
Yue W, Wang Z, Chen H, Payne A, Liu X (2018) Machine learning with applications in breast cancer diagnosis and prognosis. Designs 2:13
Zakariah M (2014) Classification of genome data using random forest algorithm. Int J Comput Techno Appl 5(5):1663–1669
Zand HKK (2015) A comparative survey on data mining techniques for breast cancer diagnosis and prediction. Indian J Fundam Appl Life Sci 5:4330–4339
Zeliha KP et al (2020) Association between ABCB1, ABCG2 carrier protein and COX-2 enzyme gene polymorphisms and breast cancer risk in a Turkish population. Saudi Pharm J 28:215–219
pubmed: 32042261

Auteurs

Elham Nazari (E)

Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran.
Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Hamid Naderi (H)

Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran.

Mahla Tabadkani (M)

Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.

Reza ArefNezhad (R)

Halal Research Center of IRI, FDA, Tehran, Iran.
Department of Anatomy, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran.

Amir Hossein Farzin (AH)

Department of Computer Engineering, Ferdowsi University, Mashhad, Iran.

Mohammad Dashtiahangar (M)

School of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran.

Majid Khazaei (M)

Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.

Gordon A Ferns (GA)

Division of Medical Education, Brighton & Sussex Medical School, Falmer, Brighton, BN1 9PH, Sussex, UK.

Amin Mehrabian (A)

Warwick Medical School, University of Warwick, Coventry, UK.

Hamed Tabesh (H)

Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran. Tabeshh@mums.ac.Ir.

Amir Avan (A)

Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran. avana@mums.ac.ir.
Faculty of Health, School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia. avana@mums.ac.ir.
College of Medicine, University of Warith Al-Anbiyaa, Karbala, Iraq. avana@mums.ac.ir.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH