Common Features in lncRNA Annotation and Classification: A Survey.

classification problems coding sequence feature extraction lncRNA machine learning

Journal

Non-coding RNA
ISSN: 2311-553X
Titre abrégé: Noncoding RNA
Pays: Switzerland
ID NLM: 101652294

Informations de publication

Date de publication:
13 Dec 2021
Historique:
received: 12 11 2021
revised: 03 12 2021
accepted: 06 12 2021
entrez: 23 12 2021
pubmed: 24 12 2021
medline: 24 12 2021
Statut: epublish

Résumé

Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.

Identifiants

pubmed: 34940758
pii: ncrna7040077
doi: 10.3390/ncrna7040077
pmc: PMC8708962
pii:
doi:

Types de publication

Journal Article Review

Langues

eng

Subventions

Organisme : Federal Ministry of Education and Research
ID : BMBF 031A538B

Références

Bioinformatics. 2022 Feb 7;38(5):1440-1442
pubmed: 34734986
RNA Biol. 2006 Jan-Mar;3(1):40-8
pubmed: 17114936
Bioinformatics. 2015 Dec 15;31(24):3897-905
pubmed: 26315901
PLoS Comput Biol. 2008 Nov;4(11):e1000176
pubmed: 19043537
Nat Chem Biol. 2015 Dec;11(12):909-16
pubmed: 26575237
Sensors (Basel). 2018 Aug 14;18(8):
pubmed: 30110960
Mod Pathol. 2013 Feb;26(2):155-65
pubmed: 22996375
Comput Appl Biosci. 1993 Dec;9(6):745-56
pubmed: 8143162
Nucleic Acids Res. 2017 Jan 9;45(1):e2
pubmed: 27608726
Cold Spring Harb Perspect Biol. 2019 Dec 2;11(12):
pubmed: 31791999
Bioinformatics. 2019 Sep 1;35(17):2949-2956
pubmed: 30649200
Nucleic Acids Res. 2021 Jan 8;49(D1):D165-D171
pubmed: 33196801
BMC Bioinformatics. 2013;14 Suppl 5:S12
pubmed: 23735199
Genes Dev. 2011 Sep 15;25(18):1915-27
pubmed: 21890647
Algorithms Mol Biol. 2011 Nov 24;6:26
pubmed: 22115189
Nucleic Acids Res. 2019 Jul 2;47(W1):W516-W522
pubmed: 31147700
Nat Rev Genet. 2017 Feb;18(2):70
pubmed: 28045101
BMC Evol Biol. 2008 Mar 27;8:99
pubmed: 18371205
Curr Opin Genet Dev. 2014 Aug;27:48-53
pubmed: 24852186
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Theory Biosci. 2020 Dec;139(4):349-359
pubmed: 33219910
Nucleic Acids Res. 2021 Jan 8;49(D1):D916-D923
pubmed: 33270111
Genome Res. 2013 Sep;23(9):1446-61
pubmed: 23796952
Bioinformatics. 2013 Oct 1;29(19):2487-9
pubmed: 23842809
Nucleic Acids Res. 2021 Jan 8;49(D1):D192-D200
pubmed: 33211869
Eur Rev Med Pharmacol Sci. 2019 Jul;23(14):6091-6104
pubmed: 31364110
BMC Bioinformatics. 2014 Sep 19;15:311
pubmed: 25239089
Comput Biol Med. 2019 Feb;105:169-181
pubmed: 30665012
Genome Res. 2005 Aug;15(8):1034-50
pubmed: 16024819
BMC Genomics. 2019 Feb 15;20(1):137
pubmed: 30767760
Am J Cancer Res. 2019 Jul 01;9(7):1354-1366
pubmed: 31392074
Bioinformatics. 2011 Jul 1;27(13):i275-82
pubmed: 21685081
Nucleic Acids Res. 1982 Sep 11;10(17):5303-18
pubmed: 7145702
Clin Chim Acta. 2017 Nov;474:1-7
pubmed: 28866116
Brief Funct Genomics. 2021 Jun 9;20(3):162-173
pubmed: 33754153
Nat Rev Genet. 2011 Nov 18;12(12):861-74
pubmed: 22094949
Nucleic Acids Res. 2017 May 5;45(8):e57
pubmed: 28053114
Genome Res. 2011 Nov;21(11):1916-28
pubmed: 21994248
Nucleic Acids Res. 2021 Jan 8;49(D1):D884-D891
pubmed: 33137190
Cancer Discov. 2016 Jul;6(7):784-801
pubmed: 27147598
Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169
pubmed: 27899622
Genes (Basel). 2019 Sep 03;10(9):
pubmed: 31484412
Nat Biotechnol. 2015 Mar;33(3):290-5
pubmed: 25690850
BMC Genomics. 2018 May 2;19(1):316
pubmed: 29720103
BMC Bioinformatics. 2009 Aug 04;10:239
pubmed: 19653905
Pac Symp Biocomput. 2010;:69-79
pubmed: 19908359
Bioinformatics. 2006 Feb 15;22(4):445-52
pubmed: 16357030
Bioinformatics. 2018 Nov 15;34(22):3889-3897
pubmed: 29850775
Front Genet. 2019 May 22;10:496
pubmed: 31178900
Nucleic Acids Res. 2019 Jan 8;47(D1):D135-D139
pubmed: 30371849
Nucleic Acids Res. 1992 Dec 25;20(24):6441-50
pubmed: 1480466
Nature. 2015 Feb 19;518(7539):409-12
pubmed: 25470045
Oncol Lett. 2016 Aug;12(2):1233-1239
pubmed: 27446422
Nature. 2016 Nov 17;539(7629):452-455
pubmed: 27783602
Int J Mol Sci. 2017 Dec 08;18(12):
pubmed: 29292750
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W345-9
pubmed: 17631615
Front Genet. 2020 Nov 30;11:527484
pubmed: 33329688
Nucleic Acids Res. 2015 Jan;43(Database issue):D174-80
pubmed: 25378313
Mol Biol Evol. 2001 Jul;18(7):1161-7
pubmed: 11420357
Gene. 1999 Jul 8;234(2):187-208
pubmed: 10395892
Eur J Paediatr Neurol. 2020 Jan;24:30-34
pubmed: 31235424
Brief Bioinform. 2019 Nov 27;20(6):2009-2027
pubmed: 30084867
Bioinformatics. 2006 Mar 1;22(5):614-5
pubmed: 16368769
RNA. 2011 Apr;17(4):578-94
pubmed: 21357752
Cancer Manag Res. 2019 Jan 17;11:803-812
pubmed: 30697072
Cell Res. 2010 Apr;20(4):445-57
pubmed: 20157333
PLoS One. 2016 May 26;11(5):e0154567
pubmed: 27228152
Nucleic Acids Res. 2020 Jan 8;48(D1):D689-D695
pubmed: 31598706
Am J Transl Res. 2016 Oct 15;8(10):4095-4105
pubmed: 27829995
Cell Stem Cell. 2015 Apr 2;16(4):439-447
pubmed: 25800779
Nucleic Acids Res. 2018 Sep 19;46(16):e96
pubmed: 29873784
Nucleic Acids Res. 2013 Sep;41(17):e166
pubmed: 23892401
Bioinformatics. 2021 Dec 7;37(23):4307-4313
pubmed: 34255826
Nucleic Acids Res. 2013 Jan;41(Database issue):D246-51
pubmed: 23042674
Nat Ecol Evol. 2018 Feb;2(2):237-240
pubmed: 29292397
Biol Res. 2016 Jul 04;49(1):31
pubmed: 27378087
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
pubmed: 9254694
Anal Bioanal Chem. 2010 Dec;398(7-8):2867-81
pubmed: 20803007
Biomed Res Int. 2016;2016:8496165
pubmed: 28042575
Nat Genet. 2018 Oct;50(10):1474-1482
pubmed: 30224646
Brief Funct Genomics. 2019 Feb 14;18(1):58-82
pubmed: 30247501
Bioinformatics. 2018 Nov 15;34(22):3825-3834
pubmed: 29850816
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489
pubmed: 33237286
Nucleic Acids Res. 2017 Jul 3;45(W1):W12-W16
pubmed: 28521017
PLoS One. 2015 Oct 05;10(10):e0139654
pubmed: 26437338
Biochim Biophys Acta. 2014 Nov;1839(11):1097-109
pubmed: 25159663
Nat Chem Biol. 2020 Apr;16(4):458-468
pubmed: 31819274
BMC Genomics. 2017 May 15;18(1):380
pubmed: 28506253
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Nucleic Acids Res. 2013 Apr 1;41(6):e74
pubmed: 23335781
Methods Mol Biol. 2007;395:503-26
pubmed: 17993695
BMC Genomics. 2013;14 Suppl 2:S7
pubmed: 23445546
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Nucleic Acids Res. 2017 Dec 15;45(22):e183
pubmed: 29036354
Biol Direct. 2016 Oct 21;11(1):55
pubmed: 27769290
Nat Cell Biol. 2019 May;21(5):542-551
pubmed: 31048766
PLoS Genet. 2006 Apr;2(4):e29
pubmed: 16683024
Genome Biol Evol. 2015 Apr 09;7(5):1380-9
pubmed: 25861819
Biochimie. 2011 Nov;93(11):2019-23
pubmed: 21835221
PLoS One. 2014 Jun 26;9(6):e100893
pubmed: 24967732

Auteurs

Christopher Klapproth (C)

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.

Rituparno Sen (R)

Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz-Center for Infection Research (HZI), D-97080 Würzburg, Germany.

Peter F Stadler (PF)

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, University Leipzig, D-04103 Leipzig, Germany.
Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany.
Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria.
Facultad de Ciencias, Universidad National de Colombia, Bogotá CO-111321, Colombia.
Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA.

Sven Findeiß (S)

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.

Jörg Fallmann (J)

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.

Classifications MeSH