Enhancing breast cancer screening with urinary biomarkers and Random Forest supervised classification: A comprehensive investigation.
breast cancer
machine-learning
sex hormones
steroids
supervised classification
urine biomarkers
Journal
Journal of pharmaceutical and biomedical analysis
ISSN: 1873-264X
Titre abrégé: J Pharm Biomed Anal
Pays: England
ID NLM: 8309336
Informations de publication
Date de publication:
20 Mar 2024
20 Mar 2024
Historique:
received:
27
01
2024
revised:
10
03
2024
accepted:
15
03
2024
medline:
31
3
2024
pubmed:
31
3
2024
entrez:
30
3
2024
Statut:
aheadofprint
Résumé
Urinary sex hormones are investigated as potential biomarkers for the early detection of breast cancer, aiming to evaluate their relevance and applicability, in combination with supervised machine-learning data analysis, toward the ultimate goal of extensive screening. Sex hormones were determined on urine samples collected from 250 post-menopausal women (65 healthy - 185 with breast cancer, recruited among the clinical patients of Candiolo Cancer Institute FPO-IRCCS (Torino, Italy). Two analytical procedures based on UHPLC-MS/HRMS were developed and comprehensively validated to quantify 20 free and conjugated sex hormones from urine samples. The quantitative data were processed by seven machine learning algorithms. The efficiency of the resulting models was compared. Among the tested models aimed to relate urinary estrogen and androgen levels and the occurrence of breast cancer, Random Forest (RF) proved to underscore all the other supervised classification approaches, including Partial Least Squares - Discriminant Analysis (PLS-DA), in terms of effectiveness and robustness. The final optimized model built on only five biomarkers (testosterone-sulphate, alpha-estradiol, 4-methoxyestradiol, DHEA-sulphate, and epitestosterone-sulphate) achieved an approximate 98% diagnostic accuracy on replicated validation sets. To balance the less-represented population of healthy women, a Synthetic Minority Oversampling TEchnique (SMOTE) data oversampling approach was applied. By means of tunable hyperparameters optimization, the RF algorithm showed great potential for early breast cancer detection, as it provides clear biomarkers ranking and their relative efficiency, allowing to ground the final diagnostic model on a restricted selection five steroid biomarkers only, as desirable for noninvasive tests with wide screening purposes.
Identifiants
pubmed: 38554554
pii: S0731-7085(24)00153-5
doi: 10.1016/j.jpba.2024.116113
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
116113Informations de copyright
Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Marco Vincenti reports financial support was provided by CRT Foundation. Marco Vincenti reports financial support was provided by Italian Ministry of Education, Universities, and Research. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.