A hierarchical spike-and-slab model for pan-cancer survival using pan-omic data.

Bayesian hierarchical modeling Bidimensionally-linked matrices Pan-omics Spike-and-slab priors The Cancer Genome Atlas (TCGA ) pan-cancer survival analysis

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
17 Jun 2022
Historique:
received: 19 07 2021
accepted: 03 06 2022
entrez: 16 6 2022
pubmed: 17 6 2022
medline: 22 6 2022
Statut: epublish

Résumé

Pan-omics, pan-cancer analysis has advanced our understanding of the molecular heterogeneity of cancer. However, such analyses have been limited in their ability to use information from multiple sources of data (e.g., omics platforms) and multiple sample sets (e.g., cancer types) to predict clinical outcomes. We address the issue of prediction across multiple high-dimensional sources of data and sample sets by using molecular patterns identified by BIDIFAC+, a method for integrative dimension reduction of bidimensionally-linked matrices, in a Bayesian hierarchical model. Our model performs variable selection through spike-and-slab priors that borrow information across clustered data. We use this model to predict overall patient survival from the Cancer Genome Atlas with data from 29 cancer types and 4 omics sources and use simulations to characterize the performance of the hierarchical spike-and-slab prior. We found that molecular patterns shared across all or most cancers were largely not predictive of survival. However, our model selected patterns unique to subsets of cancers that differentiate clinical tumor subtypes with markedly different survival outcomes. Some of these subtypes were previously established, such as subtypes of uterine corpus endometrial carcinoma, while others may be novel, such as subtypes within a set of kidney carcinomas. Through simulations, we found that the hierarchical spike-and-slab prior performs best in terms of variable selection accuracy and predictive power when borrowing information is advantageous, but also offers competitive performance when it is not. We address the issue of prediction across multiple sources of data by using results from BIDIFAC+ in a Bayesian hierarchical model for overall patient survival. By incorporating spike-and-slab priors that borrow information across cancers, we identified molecular patterns that distinguish clinical tumor subtypes within a single cancer and within a group of cancers. We also corroborate the flexibility and performance of using spike-and-slab priors as a Bayesian variable selection approach.

Sections du résumé

BACKGROUND BACKGROUND
Pan-omics, pan-cancer analysis has advanced our understanding of the molecular heterogeneity of cancer. However, such analyses have been limited in their ability to use information from multiple sources of data (e.g., omics platforms) and multiple sample sets (e.g., cancer types) to predict clinical outcomes. We address the issue of prediction across multiple high-dimensional sources of data and sample sets by using molecular patterns identified by BIDIFAC+, a method for integrative dimension reduction of bidimensionally-linked matrices, in a Bayesian hierarchical model. Our model performs variable selection through spike-and-slab priors that borrow information across clustered data. We use this model to predict overall patient survival from the Cancer Genome Atlas with data from 29 cancer types and 4 omics sources and use simulations to characterize the performance of the hierarchical spike-and-slab prior.
RESULTS RESULTS
We found that molecular patterns shared across all or most cancers were largely not predictive of survival. However, our model selected patterns unique to subsets of cancers that differentiate clinical tumor subtypes with markedly different survival outcomes. Some of these subtypes were previously established, such as subtypes of uterine corpus endometrial carcinoma, while others may be novel, such as subtypes within a set of kidney carcinomas. Through simulations, we found that the hierarchical spike-and-slab prior performs best in terms of variable selection accuracy and predictive power when borrowing information is advantageous, but also offers competitive performance when it is not.
CONCLUSIONS CONCLUSIONS
We address the issue of prediction across multiple sources of data by using results from BIDIFAC+ in a Bayesian hierarchical model for overall patient survival. By incorporating spike-and-slab priors that borrow information across cancers, we identified molecular patterns that distinguish clinical tumor subtypes within a single cancer and within a group of cancers. We also corroborate the flexibility and performance of using spike-and-slab priors as a Bayesian variable selection approach.

Identifiants

pubmed: 35710340
doi: 10.1186/s12859-022-04770-3
pii: 10.1186/s12859-022-04770-3
pmc: PMC9204947
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

235

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM130622
Pays : United States
Organisme : NCI NIH HHS
ID : R21 CA231214
Pays : United States
Organisme : NCI NIH HHS
ID : R21CA231214
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01GM130622
Pays : United States

Informations de copyright

© 2022. The Author(s).

Références

N Engl J Med. 2016 Jan 14;374(2):135-45
pubmed: 26536169
Cancer Cell. 2010 Jan 19;17(1):98-110
pubmed: 20129251
Ann Appl Stat. 2022 Mar;16(1):193-215
pubmed: 35505906
Nature. 2014 Jul 31;511(7511):543-50
pubmed: 25079552
Cell Rep. 2018 Apr 3;23(1):313-326.e5
pubmed: 29617669
Cell. 2018 Apr 5;173(2):283-285
pubmed: 29625045
Ann Appl Stat. 2013 Mar 1;7(1):523-542
pubmed: 23745156
Cell. 2018 Apr 5;173(2):400-416.e11
pubmed: 29625055
Cell. 2018 Apr 5;173(2):291-304.e6
pubmed: 29625048
N Engl J Med. 2015 Jun 25;372(26):2481-98
pubmed: 26061751
Nature. 2012 Oct 4;490(7418):61-70
pubmed: 23000897
Cancer Inform. 2017 Jul 11;16:1176935117718517
pubmed: 28747816
Comput Stat Data Anal. 2014 Jul;75:53-65
pubmed: 24795490
Cancer Inform. 2020 Feb 17;19:1176935120907399
pubmed: 32116467
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
J R Stat Soc Ser C Appl Stat. 2014 Aug;63(4):595-620
pubmed: 25705056
Biometrics. 2019 Dec;75(4):1121-1132
pubmed: 31254385
Biostatistics. 2020 Apr 1;21(2):302-318
pubmed: 30247540
Nature. 2013 May 2;497(7447):67-73
pubmed: 23636398
Biometrics. 2020 Mar;76(1):316-325
pubmed: 31393003

Auteurs

Sarah Samorodnitsky (S)

Division of Biostatistics, University of Minnesota, Minneapolis, USA.

Katherine A Hoadley (KA)

Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, USA.

Eric F Lock (EF)

Division of Biostatistics, University of Minnesota, Minneapolis, USA. elock@umn.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH