BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA-miRNA interaction prediction.

Bag of tricks Deep learning Deep learning strategies Lightweight neural network Long non-coding RNA Micro-RNA Robust interaction predictor lncRNA–miRNA interaction prediction

Journal

Interdisciplinary sciences, computational life sciences
ISSN: 1867-1462
Titre abrégé: Interdiscip Sci
Pays: Germany
ID NLM: 101515919

Informations de publication

Date de publication:
Dec 2022
Historique:
received: 17 11 2021
accepted: 12 07 2022
revised: 16 06 2022
pubmed: 11 8 2022
medline: 22 10 2022
entrez: 10 8 2022
Statut: ppublish

Résumé

Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach "Bot-Net" which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA-miRNA interaction prediction. BoT-Net outperforms the state-of-the-art lncRNA-miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA-protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. In the benchmark lncRNA-miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA-protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA-miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/.

Sections du résumé

BACKGROUND AND OBJECTIVE OBJECTIVE
Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences.
METHOD METHODS
The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach "Bot-Net" which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA-miRNA interaction prediction.
RESULTS RESULTS
BoT-Net outperforms the state-of-the-art lncRNA-miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA-protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%.
CONCLUSION CONCLUSIONS
In the benchmark lncRNA-miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA-protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA-miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process.
AVAILABILITY BACKGROUND
BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/.

Identifiants

pubmed: 35947255
doi: 10.1007/s12539-022-00535-x
pii: 10.1007/s12539-022-00535-x
pmc: PMC9581873
doi:

Substances chimiques

RNA, Long Noncoding 0
MicroRNAs 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

841-862

Informations de copyright

© 2022. The Author(s).

Références

Mol Cell. 2017 Oct 5;68(1):171-184.e6
pubmed: 28985503
BMC Res Notes. 2010 May 26;3:145
pubmed: 20500905
Biomed Pharmacother. 2020 Jan;121:109627
pubmed: 31810120
Math Biosci. 2019 Jun;312:67-76
pubmed: 31034845
Int J Mol Sci. 2018 Feb 28;19(3):
pubmed: 29495592
IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1516-1524
pubmed: 31796414
Bioinformatics. 2020 May 1;36(10):2986-2992
pubmed: 32087005
PLoS Genet. 2021 Feb 4;17(2):e1009303
pubmed: 33539374
Brief Bioinform. 2020 May 21;21(3):1047-1057
pubmed: 31067315
BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):478
pubmed: 29219068
Brief Bioinform. 2021 Mar 22;22(2):2032-2042
pubmed: 32181478
Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:4955-8
pubmed: 22255450
Nucleic Acids Res. 2019 Jan 8;47(D1):D155-D162
pubmed: 30423142
Mol Biosyst. 2013 Jan 27;9(1):133-42
pubmed: 23138266
BMC Genomics. 2018 Aug 13;19(Suppl 6):565
pubmed: 30367576
BMC Genomics. 2019 Dec 20;20(Suppl 11):946
pubmed: 31856716
J Biomed Inform. 2019 May;93:103159
pubmed: 30926470
J Cell Mol Med. 2020 Jan;24(1):79-87
pubmed: 31568653
Proteins. 2020 Jan;88(1):15-30
pubmed: 31228283
BMC Syst Biol. 2017 Mar 14;11(Suppl 2):9
pubmed: 28361676
Bioinformatics. 2018 Mar 1;34(5):812-819
pubmed: 29069317
J Cell Physiol. 2019 Mar;234(3):2194-2203
pubmed: 30229908
Int J Mol Sci. 2019 Nov 08;20(22):
pubmed: 31717266
Genomics. 2021 May;113(3):874-880
pubmed: 33588070
Front Genet. 2020 Feb 04;11:18
pubmed: 32117437
Interdiscip Sci. 2020 Sep;12(3):368-376
pubmed: 32488835
PLoS One. 2017 Feb 3;12(2):e0171410
pubmed: 28158264
Nucleic Acids Res. 2011 Jan;39(Database issue):D277-82
pubmed: 21071426
Mol Cell. 2013 Oct 10;52(1):101-12
pubmed: 24055342
Nature. 2017 Oct 18;550(7676):354-359
pubmed: 29052630
Mol Cancer. 2017 Feb 17;16(1):42
pubmed: 28212646
Stem Cells. 2014 Nov;32(11):2858-68
pubmed: 25070049
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):405-415
pubmed: 31369383
Bioinformatics. 2005 Jan 15;21(2):218-26
pubmed: 15319262
Genome Biol. 2018 Jun 26;19(1):80
pubmed: 29945655
Bioinformatics. 2018 Feb 1;34(3):398-406
pubmed: 29028927
Anal Biochem. 2019 Apr 15;571:53-61
pubmed: 30822398
Front Genet. 2019 Aug 29;10:758
pubmed: 31555320
BMC Bioinformatics. 2020 Jun 9;21(1):235
pubmed: 32517697
Methods Mol Biol. 2019;1970:251-277
pubmed: 30963497
Oncogene. 2017 Oct 12;36(41):5661-5667
pubmed: 28604750
Comput Intell Neurosci. 2018 Feb 1;2018:7068349
pubmed: 29487619
BMC Genomics. 2020 Dec 17;21(Suppl 13):867
pubmed: 33334307
BMC Med Genomics. 2018 Dec 31;11(Suppl 6):113
pubmed: 30598112
RNA. 2013 Apr;19(4):467-74
pubmed: 23404894
Anal Biochem. 2020 Jul 15;601:113767
pubmed: 32454029
Nucleic Acids Res. 2018 Jan 4;46(D1):D276-D280
pubmed: 29077939
Interdiscip Sci. 2020 Mar;12(1):82-89
pubmed: 31811618
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773
pubmed: 30357393
PLoS One. 2017 Mar 21;12(3):e0173288
pubmed: 28323839
IEEE Trans Neural Netw Learn Syst. 2021 Feb;32(2):604-624
pubmed: 32324570
BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):525
pubmed: 30598096
Bioinformatics. 2017 Jul 01;33(13):1930-1936
pubmed: 28334114
RSC Adv. 2020 Mar 23;10(20):11634-11642
pubmed: 35496629
PLoS Comput Biol. 2019 Dec 19;15(12):e1007560
pubmed: 31856220
BMC Bioinformatics. 2011 Dec 22;12:489
pubmed: 22192482
Bioinformatics. 2019 Oct 1;35(19):3831-3833
pubmed: 30850831
Atherosclerosis. 2018 May;272:153-161
pubmed: 29609130
Molecules. 2020 Sep 23;25(19):
pubmed: 32977679
Nucleic Acids Res. 2021 May 7;49(8):e46
pubmed: 33503258
Sci Rep. 2020 Sep 3;10(1):14634
pubmed: 32884053
Nat Immunol. 2019 Jul;20(7):812-823
pubmed: 31036902
Brief Bioinform. 2021 Sep 2;22(5):
pubmed: 33822882
Interdiscip Sci. 2021 Sep;13(3):535-545
pubmed: 34232474
Biomed Res Int. 2015;2015:902198
pubmed: 26634213
PLoS One. 2013;8(2):e53823
pubmed: 23405074
BMC Genomics. 2016 Aug 09;17:582
pubmed: 27506469
Nat Cell Biol. 2019 May;21(5):542-551
pubmed: 31048766
Comput Biol Chem. 2020 Dec;89:107406
pubmed: 33120126
PLoS One. 2018 May 1;13(5):e0196681
pubmed: 29715309

Auteurs

Muhammad Nabeel Asim (MN)

Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany. Muhammad_Nabeel.Asim@dfki.de.
German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany. Muhammad_Nabeel.Asim@dfki.de.

Muhammad Ali Ibrahim (MA)

Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.

Christoph Zehe (C)

Sartorius Stedim Cellca GmbH, 88471, Laupheim, Baden-Wurttemberg, Germany.

Johan Trygg (J)

Sartorius Stedim Cellca GmbH, 88471, Laupheim, Baden-Wurttemberg, Germany.
Computational Life Science Cluster (CLiC), Umea University, 90187, Umea, Sweden.

Andreas Dengel (A)

Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.

Sheraz Ahmed (S)

German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
Computational Life Science Cluster (CLiC), Umea University, 90187, Umea, Sweden.

Articles similaires

Humans Endoribonucleases RNA, Messenger RNA Caps Gene Expression Regulation
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic
Animals Lung India Sheep Transcriptome

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH