Enhancing breast cancer screening with urinary biomarkers and Random Forest supervised classification: A comprehensive investigation.

breast cancer machine-learning sex hormones steroids supervised classification urine biomarkers

Journal

Journal of pharmaceutical and biomedical analysis
ISSN: 1873-264X
Titre abrégé: J Pharm Biomed Anal
Pays: England
ID NLM: 8309336

Informations de publication

Date de publication:
20 Mar 2024
Historique:
received: 27 01 2024
revised: 10 03 2024
accepted: 15 03 2024
medline: 31 3 2024
pubmed: 31 3 2024
entrez: 30 3 2024
Statut: aheadofprint

Résumé

Urinary sex hormones are investigated as potential biomarkers for the early detection of breast cancer, aiming to evaluate their relevance and applicability, in combination with supervised machine-learning data analysis, toward the ultimate goal of extensive screening. Sex hormones were determined on urine samples collected from 250 post-menopausal women (65 healthy - 185 with breast cancer, recruited among the clinical patients of Candiolo Cancer Institute FPO-IRCCS (Torino, Italy). Two analytical procedures based on UHPLC-MS/HRMS were developed and comprehensively validated to quantify 20 free and conjugated sex hormones from urine samples. The quantitative data were processed by seven machine learning algorithms. The efficiency of the resulting models was compared. Among the tested models aimed to relate urinary estrogen and androgen levels and the occurrence of breast cancer, Random Forest (RF) proved to underscore all the other supervised classification approaches, including Partial Least Squares - Discriminant Analysis (PLS-DA), in terms of effectiveness and robustness. The final optimized model built on only five biomarkers (testosterone-sulphate, alpha-estradiol, 4-methoxyestradiol, DHEA-sulphate, and epitestosterone-sulphate) achieved an approximate 98% diagnostic accuracy on replicated validation sets. To balance the less-represented population of healthy women, a Synthetic Minority Oversampling TEchnique (SMOTE) data oversampling approach was applied. By means of tunable hyperparameters optimization, the RF algorithm showed great potential for early breast cancer detection, as it provides clear biomarkers ranking and their relative efficiency, allowing to ground the final diagnostic model on a restricted selection five steroid biomarkers only, as desirable for noninvasive tests with wide screening purposes.

Identifiants

pubmed: 38554554
pii: S0731-7085(24)00153-5
doi: 10.1016/j.jpba.2024.116113
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

116113

Informations de copyright

Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Marco Vincenti reports financial support was provided by CRT Foundation. Marco Vincenti reports financial support was provided by Italian Ministry of Education, Universities, and Research. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Eugenio Alladio (E)

Department of Chemistry, University of Turin, Italy; Centro Regionale Antidoping, Orbassano, TO, Italy.

Fulvia Trapani (F)

Department of Chemistry, University of Turin, Italy; Centro Regionale Antidoping, Orbassano, TO, Italy.

Lorenzo Castellino (L)

Department of Chemistry, University of Turin, Italy; Centro Regionale Antidoping, Orbassano, TO, Italy.

Marta Massano (M)

Department of Chemistry, University of Turin, Italy; Centro Regionale Antidoping, Orbassano, TO, Italy.

Daniele Di Corcia (D)

Centro Regionale Antidoping, Orbassano, TO, Italy.

Alberto Salomone (A)

Department of Chemistry, University of Turin, Italy; Centro Regionale Antidoping, Orbassano, TO, Italy.

Enrico Berrino (E)

Department of Medical Sciences, University of Turin, Turin, Italy; Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Italy.

Riccardo Ponzone (R)

Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Italy.

Caterina Marchiò (C)

Department of Medical Sciences, University of Turin, Turin, Italy; Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Italy.

Anna Sapino (A)

Department of Medical Sciences, University of Turin, Turin, Italy; Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Italy.

Marco Vincenti (M)

Department of Chemistry, University of Turin, Italy; Centro Regionale Antidoping, Orbassano, TO, Italy. Electronic address: marco.vincenti@unito.it.

Classifications MeSH