Union with Recursive Feature Elimination: A feature selection framework to improve the classification performance of multi-category causes of death in colorectal cancer.

Colorectal Cancer Feature Selection Machine Learning Multi-category Death Causes U-RFE

Journal

Laboratory investigation; a journal of technical methods and pathology
ISSN: 1530-0307
Titre abrégé: Lab Invest
Pays: United States
ID NLM: 0376617

Informations de publication

Date de publication:
27 Dec 2023
Historique:
received: 26 03 2023
revised: 05 12 2023
accepted: 20 12 2023
medline: 2 1 2024
pubmed: 2 1 2024
entrez: 29 12 2023
Statut: aheadofprint

Résumé

Despite the use of machine learning tools, it is challenging to properly model cause-specific deaths in colorectal cancer (CRC) patients and choose appropriate treatments. Here, we propose an interesting feature selection framework, namely union with recursive feature elimination (U-RFE), to select the union feature sets that are crucial in CRC progression-specific mortality using the TCGA dataset. Based on the union feature sets, the performance of 5 classification algorithms, including logistic regression (LR), support vector machines (SVM), random forest (RF), eXtreme gradient boosting (XGBoost) and Stacking, were compared to identify the best model for classifying 4-category deaths. In the first stage of U-RFE, LR, SVM and RF were used as base estimators to obtain subsets containing the same number of features but not exactly the same specific features. Union analysis of the subsets was then performed to determine the final union feature set, that can effectively combine the advantages of different algorithms. We found that the U-RFE framework could improve various models' performance. Stacking outperformed LR, SVM, RF and XGBoost in most scenarios. When the target feature number of the RFE was set to 50 and the union feature set contains 298 deterministic features, the Stacking model achieves F1_weighted, Recall_weighted, Precision_weighted, Accuracy and Matthews Correlation Coefficient of 0.851, 0.864, 0.854, 0.864 and 0.717, respectively. The performance on the minority categories was also significantly improved. Therefore, this recursive-feature-elimination based approach of feature selection improves performances of classifying CRC deaths using clinical and 'omic data or those using other data with high feature redundancy and imbalance.

Identifiants

pubmed: 38158124
pii: S0023-6837(23)00263-5
doi: 10.1016/j.labinv.2023.100320
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

100320

Informations de copyright

Copyright © 2023. Published by Elsevier Inc.

Auteurs

Fei Deng (F)

School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China. Electronic address: 2606897447@qq.com.

Lin Zhao (L)

School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.

Ning Yu (N)

School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.

Yuxiang Lin (Y)

School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.

Lanjing Zhang (L)

Department of Biological Sciences, Rutgers University, Newark, NJ, USA; Department of Pathology, Princeton Medical Center, Plainsboro, NJ, USA; Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA; Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA. Electronic address: lanjing.zhang@rutgers.edu.

Classifications MeSH