Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers.
autoencoder
concrete autoencoder
deep learning
feature selection
lncRNA
mrCAE
Journal
International journal of molecular sciences
ISSN: 1422-0067
Titre abrégé: Int J Mol Sci
Pays: Switzerland
ID NLM: 101092791
Informations de publication
Date de publication:
03 Nov 2021
03 Nov 2021
Historique:
received:
01
10
2021
revised:
28
10
2021
accepted:
30
10
2021
entrez:
13
11
2021
pubmed:
14
11
2021
medline:
15
12
2021
Statut:
epublish
Résumé
Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.
Sections du résumé
BACKGROUND
BACKGROUND
Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy.
METHOD
METHODS
To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers.
RESULTS
RESULTS
Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers.
CONCLUSION
CONCLUSIONS
The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.
Identifiants
pubmed: 34769351
pii: ijms222111919
doi: 10.3390/ijms222111919
pmc: PMC8584911
pii:
doi:
Substances chimiques
Biomarkers, Tumor
0
RNA, Long Noncoding
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : National Science Foundation
ID : 1901628
Références
Nucleic Acids Res. 2018 Jan 4;46(D1):D371-D374
pubmed: 29106639
Br J Cancer. 2013 Jun 25;108(12):2419-25
pubmed: 23660942
Nucleic Acids Res. 2013 Jan;41(Database issue):D983-6
pubmed: 23175614
Nucleic Acids Res. 2018 Jan 4;46(D1):D100-D105
pubmed: 28985416
Science. 2006 Jul 28;313(5786):504-7
pubmed: 16873662
BMC Med Genomics. 2018 Dec 31;11(Suppl 6):114
pubmed: 30598113
Respir Med. 2016 Jan;110:12-9
pubmed: 26603340
Cell. 2018 Apr 5;173(2):291-304.e6
pubmed: 29625048
Genomics Proteomics Bioinformatics. 2016 Feb;14(1):42-54
pubmed: 26883671
Nucleic Acids Res. 2021 Jan 8;49(D1):D969-D980
pubmed: 33045741
Cancer Cell. 2016 Apr 11;29(4):452-463
pubmed: 27070700
Cell. 2011 Mar 4;144(5):646-74
pubmed: 21376230
Nucleic Acids Res. 2018 Feb 16;46(3):1113-1123
pubmed: 29325141
Nucleic Acids Res. 2016 Jan 4;44(D1):D980-5
pubmed: 26481356
Int J Mol Sci. 2019 Nov 08;20(22):
pubmed: 31717266