Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers.


Journal

International journal of molecular sciences
ISSN: 1422-0067
Titre abrégé: Int J Mol Sci
Pays: Switzerland
ID NLM: 101092791

Informations de publication

Date de publication:
03 Nov 2021
Historique:
received: 01 10 2021
revised: 28 10 2021
accepted: 30 10 2021
entrez: 13 11 2021
pubmed: 14 11 2021
medline: 15 12 2021
Statut: epublish

Résumé

Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.

Sections du résumé

BACKGROUND BACKGROUND
Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy.
METHOD METHODS
To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers.
RESULTS RESULTS
Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers.
CONCLUSION CONCLUSIONS
The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.

Identifiants

pubmed: 34769351
pii: ijms222111919
doi: 10.3390/ijms222111919
pmc: PMC8584911
pii:
doi:

Substances chimiques

Biomarkers, Tumor 0
RNA, Long Noncoding 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : National Science Foundation
ID : 1901628

Références

Nucleic Acids Res. 2018 Jan 4;46(D1):D371-D374
pubmed: 29106639
Br J Cancer. 2013 Jun 25;108(12):2419-25
pubmed: 23660942
Nucleic Acids Res. 2013 Jan;41(Database issue):D983-6
pubmed: 23175614
Nucleic Acids Res. 2018 Jan 4;46(D1):D100-D105
pubmed: 28985416
Science. 2006 Jul 28;313(5786):504-7
pubmed: 16873662
BMC Med Genomics. 2018 Dec 31;11(Suppl 6):114
pubmed: 30598113
Respir Med. 2016 Jan;110:12-9
pubmed: 26603340
Cell. 2018 Apr 5;173(2):291-304.e6
pubmed: 29625048
Genomics Proteomics Bioinformatics. 2016 Feb;14(1):42-54
pubmed: 26883671
Nucleic Acids Res. 2021 Jan 8;49(D1):D969-D980
pubmed: 33045741
Cancer Cell. 2016 Apr 11;29(4):452-463
pubmed: 27070700
Cell. 2011 Mar 4;144(5):646-74
pubmed: 21376230
Nucleic Acids Res. 2018 Feb 16;46(3):1113-1123
pubmed: 29325141
Nucleic Acids Res. 2016 Jan 4;44(D1):D980-5
pubmed: 26481356
Int J Mol Sci. 2019 Nov 08;20(22):
pubmed: 31717266

Auteurs

Abdullah Al Mamun (A)

Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA.

Raihanul Bari Tanvir (RB)

Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA.

Masrur Sobhan (M)

Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA.

Kalai Mathee (K)

Department of Human and Molecular Genetics, Herbert Wertheim College of Medicine, Florida International University, Miami, FL 33199, USA.
Biomolecular Sciences Institute, Florida International University, Miami, FL 33199, USA.

Giri Narasimhan (G)

Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA.
Biomolecular Sciences Institute, Florida International University, Miami, FL 33199, USA.

Gregory E Holt (GE)

Department of Medicine, Miami VA Healthcare System, Miami, FL 33125, USA.
Department of Medicine, University of Miami, Miami, FL 33146, USA.

Ananda Mohan Mondal (AM)

Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA.
Department of Human and Molecular Genetics, Herbert Wertheim College of Medicine, Florida International University, Miami, FL 33199, USA.
Biomolecular Sciences Institute, Florida International University, Miami, FL 33199, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH