Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction.


Journal

International journal of environmental research and public health
ISSN: 1660-4601
Titre abrégé: Int J Environ Res Public Health
Pays: Switzerland
ID NLM: 101238455

Informations de publication

Date de publication:
12 10 2021
Historique:
received: 25 05 2021
revised: 25 09 2021
accepted: 27 09 2021
entrez: 23 10 2021
pubmed: 24 10 2021
medline: 3 11 2021
Statut: epublish

Résumé

This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.

Identifiants

pubmed: 34682416
pii: ijerph182010670
doi: 10.3390/ijerph182010670
pmc: PMC8535206
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Références

N Engl J Med. 2017 Dec 7;377(23):2228-2239
pubmed: 29211679
Crit Rev Oncol Hematol. 2019 May;137:123-130
pubmed: 31014508
Sci Rep. 2017 Feb 24;7:43263
pubmed: 28233817
Environ Health Perspect. 2016 Oct;124(10):1575-1582
pubmed: 27203080
Acta Diabetol. 2016 Feb;53(1):99-107
pubmed: 25916213
JAMA Oncol. 2019 Feb 1;5(2):155-163
pubmed: 30520976
Breast Cancer Res. 1999;1(1):14-7
pubmed: 11250676
Int J Biol Sci. 2017 Nov 1;13(11):1387-1397
pubmed: 29209143
Breast Cancer Res Treat. 2013 Jan;137(1):225-36
pubmed: 23132534
Comput Biol Med. 2021 May;132:104318
pubmed: 33744608
IEEE Trans Pattern Anal Mach Intell. 2010 Nov;32(11):1921-39
pubmed: 20847385
Comput Methods Programs Biomed. 2018 Jan;153:259-268
pubmed: 29157458
Bioinformatics. 2008 Jan 15;24(2):258-64
pubmed: 18024475
Gac Sanit. 2015 Jul-Aug;29(4):308-15
pubmed: 25613680
Comput Biol Med. 2016 Dec 1;79:80-91
pubmed: 27768905
Med Image Anal. 2021 Jul;71:102049
pubmed: 33901993
Clin Breast Cancer. 2020 Jun;20(3):e301-e308
pubmed: 32139272
Am J Surg. 2020 Mar;219(3):430-433
pubmed: 31635794
Cancer Manag Res. 2018 Jan 18;10:143-151
pubmed: 29403312
Cancer Epidemiol Biomarkers Prev. 2010 Oct;19(10):2496-502
pubmed: 20802021
Clin Breast Cancer. 2021 May 5;:
pubmed: 34078566
Comput Biol Med. 2018 Jan 1;92:168-175
pubmed: 29202321
J Bioinform Comput Biol. 2016 Oct;14(5):1650029
pubmed: 27640811
Epidemiology. 2007 Jan;18(1):137-57
pubmed: 17130685
Clin Breast Cancer. 2021 Jun;21(3):e199-e203
pubmed: 32933862
Cancer Treat Rev. 2015 Jan;41(1):1-8
pubmed: 25467110
Genes Dis. 2019 Sep 10;8(2):117-123
pubmed: 33997158
Am J Obstet Gynecol. 2017 Jun;216(6):580.e1-580.e9
pubmed: 28188769
Alcohol Clin Exp Res. 2016 Jun;40(6):1166-81
pubmed: 27130687
Alcohol Clin Exp Res. 2004 Jul;28(7):1084-90
pubmed: 15252295
CA Cancer J Clin. 2015 Mar;65(2):87-108
pubmed: 25651787
Trends Mol Med. 2019 Oct;25(10):866-881
pubmed: 31383623
Br J Cancer. 2002 Nov 18;87(11):1234-45
pubmed: 12439712
CA Cancer J Clin. 2018 Nov;68(6):394-424
pubmed: 30207593

Auteurs

Nahúm Cueto López (NC)

Department of Electrical, Systems and Automatic Engineering, Universidad of León, Campus de Vegazana s/n, 24071 León, Spain.

María Teresa García-Ordás (MT)

Department of Electrical, Systems and Automatic Engineering, Universidad of León, Campus de Vegazana s/n, 24071 León, Spain.

Facundo Vitelli-Storelli (F)

Centro de Investigación Biomédica en Red (CIBER), Grupo Investigación Interacciones Gen-Ambiente y Salud (GIIGAS), Instituto de Biomedicina (IBIOMED), Universidad de León, 24071 León, Spain.

Pablo Fernández-Navarro (P)

Cancer and Environmental Epidemiology Unit, National Center for Epidemiology, Carlos III Institute of Health, 28903 Madrid, Spain.
Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), 28029 Madrid, Spain.

Camilo Palazuelos (C)

Department of Mathematics, Statistics, and Computing, University of Cantabria-IDIVAL, 39005 Santander, Spain.

Rocío Alaiz-Rodríguez (R)

Department of Electrical, Systems and Automatic Engineering, Universidad of León, Campus de Vegazana s/n, 24071 León, Spain.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH