Benchmarking variational AutoEncoders on cancer transcriptomics data.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2023
Historique:
received: 15 02 2023
accepted: 13 09 2023
medline: 1 11 2023
pubmed: 5 10 2023
entrez: 5 10 2023
Statut: epublish

Résumé

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream tasks of cluster agreement with cancer subtypes and survival analysis. We studied the effect of the latent space dimensionality, learning rate, optimizer, initialization and activation function on the quality of subsequent downstream tasks on the TCGA samples. We found β-TCVAE and DIP-VAE to have a good performance, on average, despite being more sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. To ensure generalization, we tested all hyperparameter configurations on the GTEx dataset. We found a significant correlation (ρ = 0.7) between the hyperparameter effects on clustering performance in the TCGA and GTEx datasets. This highlights the robustness and generalizability of our recommendations. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we measured the correlation and mutual information of the different representations with various data characteristics such as gender, age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics nor capture separable information in the latent factors even for models specifically designed for disentanglement.

Identifiants

pubmed: 37796856
doi: 10.1371/journal.pone.0292126
pii: PONE-D-23-04497
pmc: PMC10553230
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0292126

Informations de copyright

Copyright: © 2023 Eltager et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

PLoS One. 2019 Dec 5;14(12):e0219102
pubmed: 31805048
BMC Bioinformatics. 2021 Sep 22;22(1):453
pubmed: 34551729
Multimed Tools Appl. 2023;82(11):16591-16633
pubmed: 36185324
Curr Protoc Hum Genet. 2014 Oct 01;83:11.13.1-20
pubmed: 25271838
Nat Commun. 2021 Mar 19;12(1):1740
pubmed: 33741950
BMC Cancer. 2015 Nov 09;15:877
pubmed: 26553136
Cell. 2018 Apr 5;173(2):400-416.e11
pubmed: 29625055
Nat Genet. 2015 Sep;47(9):1067-72
pubmed: 26258849
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
Sci Rep. 2019 Mar 26;9(1):5233
pubmed: 30914743
Neuron. 2021 Dec 1;109(23):3879-3892.e5
pubmed: 34619093
Pac Symp Biocomput. 2019;24:362-373
pubmed: 30963075
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828
pubmed: 23787338
Cell. 2012 May 25;149(5):979-93
pubmed: 22608084
J R Soc Interface. 2018 Apr;15(141):
pubmed: 29618526
Biol Cybern. 1975 Nov 5;20(3-4):121-36
pubmed: 1203338
Pac Symp Biocomput. 2018;23:80-91
pubmed: 29218871
Nature. 2020 Feb;578(7793):94-101
pubmed: 32025018
Nat Genet. 2013 Jun;45(6):580-5
pubmed: 23715323
Bioinformatics. 2019 Oct 1;35(19):3743-3751
pubmed: 30850846
Cell. 2018 Apr 5;173(2):291-304.e6
pubmed: 29625048

Auteurs

Mostafa Eltager (M)

Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.

Tamim Abdelaal (T)

Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands.

Mohammed Charrout (M)

Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.

Ahmed Mahfouz (A)

Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands.

Marcel J T Reinders (MJT)

Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands.

Stavros Makrodimitris (S)

Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
Department of Medical Oncology, Erasmus Medical Center, Rotterdam, The Netherlands.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH