Statistical and Machine Learning Methods for Discovering Prognostic Biomarkers for Survival Outcomes.


Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2023
Historique:
entrez: 17 3 2023
pubmed: 18 3 2023
medline: 22 3 2023
Statut: ppublish

Résumé

Discovering molecular biomarkers for predicting patient survival outcomes is an essential step toward improving prognosis and therapeutic decision-making in the treatment of severe diseases such as cancer. Due to the high-dimensionality nature of omics datasets, statistical methods such as the least absolute shrinkage and selection operator (Lasso) have been widely applied for cancer biomarker discovery. Due to their scalability and demonstrated prediction performance, machine learning methods such as XGBoost and neural network models have also been gaining popularity in the community recently. However, compared to more traditional survival methods such as Kaplan-Meier and Cox regression methods, high-dimensional methods for survival outcomes are still less well known to biomedical researchers. In this chapter, we will discuss the key analytical procedures in employing these methods for identifying biomarkers associated with survival data. We will also identify important considerations that emerged from the analysis of actual omics data. Some typical instances of misapplication and misinterpretation of machine learning methods will also be discussed. Using lung cancer and head and neck cancer datasets as demonstrations, we provide step-by-step instructions and sample R codes for prioritizing prognostic biomarkers.

Identifiants

pubmed: 36929071
doi: 10.1007/978-1-0716-2986-4_2
doi:

Substances chimiques

Biomarkers 0
Biomarkers, Tumor 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

11-21

Informations de copyright

© 2023. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.

Références

Van Belle V, Pelckmans K, Suykens JA, Van Huffel S, Survival SVM (2008) A practical scalable algorithm. In: ESANN, 2008. Citeseer, p 94
Wilson CM, Li K, Sun Q, Kuan PF, Wang X (2021) Fenchel duality of cox partial likelihood with an application in survival kernel learning. Artif Intell Med 116:102077
doi: 10.1016/j.artmed.2021.102077 pubmed: 34020756 pmcid: 8159024
Li K, Yao S, Zhang Z, Cao B, Wilson CM, Kalos D, Kuan PF, Zhu R, Wang X (2022) Efficient gradient boosting for prognostic biomarker discovery. Bioinformatics 38(6):1631–1638
doi: 10.1093/bioinformatics/btab869 pubmed: 34978570
Steingrimsson JA, Morrison S (2020) Deep learning for survival outcomes. Stat Med 39(17):2339–2349
doi: 10.1002/sim.8542 pubmed: 32281672 pmcid: 7334068
Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y (2018) DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18(1):1–12
doi: 10.1186/s12874-018-0482-1
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320
doi: 10.1111/j.1467-9868.2005.00503.x
Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1
doi: 10.18637/jss.v039.i05 pubmed: 27065756 pmcid: 4824408
Sill M, Hielscher T, Becker N, Zucknick M (2015) c060: extended inference with lasso and elastic-net regularized Cox and generalized linear models. J Stat Softw 62:1–22
Wang S, Nan B, Rosset S, Zhu J (2011) Random lasso. Ann Appl Stat 5(1):468
doi: 10.1214/10-AOAS377 pubmed: 22997542 pmcid: 3445423
Kim S, Baladandayuthapani V, Lee JJ (2017) Prediction-oriented marker selection (PROMISE): with application to high-dimensional regression. Stat Biosci 9(1):217–245
doi: 10.1007/s12561-016-9169-5 pubmed: 28785367
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
doi: 10.1214/aos/1013203451
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (KDD’16), vol 785. San Francisco, CA, p 794
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Proces Syst 30:3146–3154
Masters T (1993) Practical neural network recipes in C++. Academic Press Professional, Inc., Boston

Auteurs

Sijie Yao (S)

Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.

Xuefeng Wang (X)

Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA. xuefeng.wang@moffitt.org.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH