DNABERT-based explainable lncRNA identification in plant genome assemblies.
Cross-species prediction
Deep learning
Genomic motif
LncRNAs
Natural language processing
Journal
Computational and structural biotechnology journal
ISSN: 2001-0370
Titre abrégé: Comput Struct Biotechnol J
Pays: Netherlands
ID NLM: 101585369
Informations de publication
Date de publication:
2023
2023
Historique:
received:
23
11
2022
revised:
13
11
2023
accepted:
13
11
2023
medline:
7
12
2023
pubmed:
7
12
2023
entrez:
7
12
2023
Statut:
epublish
Résumé
Long non-coding ribonucleic acids (lncRNAs) have been shown to play an important role in plant gene regulation, involving both epigenetic and transcript regulation. LncRNAs are transcripts longer than 200 nucleotides that are not translated into functional proteins but can be translated into small peptides. Machine learning models have predominantly used transcriptome data with manually defined features to detect lncRNAs, however, they often underrepresent the abundance of lncRNAs and can be biased in their detection. Here we present a study using Natural Language Processing (NLP) models to identify plant lncRNAs from genomic sequences rather than transcriptomic data. The NLP models were trained to predict lncRNAs for seven model and crop species (
Identifiants
pubmed: 38058296
doi: 10.1016/j.csbj.2023.11.025
pii: S2001-0370(23)00439-7
pmc: PMC10696397
doi:
Types de publication
Journal Article
Langues
eng
Pagination
5676-5685Informations de copyright
© 2023 The Authors.
Déclaration de conflit d'intérêts
None.
Références
Proc Natl Acad Sci U S A. 2019 Sep 10;116(37):18717-18722
pubmed: 31451662
J Mol Biol. 2019 Apr 5;431(8):1592-1603
pubmed: 30890332
Proc Natl Acad Sci U S A. 2014 Nov 11;111(45):16160-5
pubmed: 25349421
Biochim Biophys Acta. 2016 Jan;1859(1):16-22
pubmed: 26297315
J Hum Transcr. 2015 Jan 1;1(1):2-9
pubmed: 27335896
BMC Genomics. 2016 Mar 15;17:238
pubmed: 26980266
Plant Cell. 2017 May;29(5):1024-1038
pubmed: 28400491
Genome Biol. 2014 Dec 03;15(12):512
pubmed: 25517485
PLoS One. 2021 Apr 14;16(4):e0247215
pubmed: 33852582
Science. 2011 Jan 7;331(6013):76-9
pubmed: 21127216
Plant Cell. 2012 Nov;24(11):4333-45
pubmed: 23136377
Comput Struct Biotechnol J. 2020 Nov 19;18:3666-3677
pubmed: 33304463
Nucleic Acids Res. 2022 Jan 7;50(D1):D1442-D1447
pubmed: 34723326
Nature. 2017 Jun 22;546(7659):524-527
pubmed: 28605751
BMC Bioinformatics. 2014 Sep 19;15:311
pubmed: 25239089
Science. 2010 Jan 1;327(5961):94-7
pubmed: 19965720
Sci China Life Sci. 2018 Feb;61(2):190-198
pubmed: 29101587
Mol Cell. 2020 Mar 5;77(5):1055-1065.e4
pubmed: 31952990
Nat Commun. 2018 Aug 29;9(1):3516
pubmed: 30158538
Front Plant Sci. 2021 Jan 06;11:603246
pubmed: 33488652
Sci Rep. 2019 Mar 21;9(1):5002
pubmed: 30899041
Nucleic Acids Res. 2010 May;38(9):3081-93
pubmed: 20110261
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4
pubmed: 15608248
Proc Natl Acad Sci U S A. 2021 Mar 9;118(10):
pubmed: 33658387
Nat Genet. 2007 Aug;39(8):1033-7
pubmed: 17643101
BMC Genomics. 2014 Sep 06;15:766
pubmed: 25194793
Plant Physiol. 2018 Mar;176(3):2133-2147
pubmed: 29284742
Biochim Biophys Acta. 2014 Mar;1840(3):1063-71
pubmed: 24184936
Bioinformatics. 2021 Aug 9;37(15):2112-2120
pubmed: 33538820
Nature. 2011 Apr 7;472(7341):120-4
pubmed: 21423168
Nat Genet. 2011 Aug 28;43(10):1035-9
pubmed: 21873998
Nat Rev Genet. 2019 Jul;20(7):389-403
pubmed: 30971806
Methods Mol Biol. 2019;1933:415-429
pubmed: 30945201
PLoS One. 2014 Jun 03;9(6):e98958
pubmed: 24892290
Nucleic Acids Res. 2021 Jan 8;49(D1):D212-D220
pubmed: 33106848
Nucleic Acids Res. 2021 Jan 8;49(D1):D86-D91
pubmed: 33221906
Nucleic Acids Res. 2017 Jan 4;45(D1):D128-D134
pubmed: 27794554
Rice (N Y). 2013 Feb 06;6(1):4
pubmed: 24280374
BMC Plant Biol. 2021 Sep 7;21(1):410
pubmed: 34493227
Cell Rep. 2015 May 19;11(7):1110-22
pubmed: 25959816
BMC Plant Biol. 2011 Apr 07;11:61
pubmed: 21473757
Nucleic Acids Res. 2021 Jan 8;49(D1):D1489-D1495
pubmed: 33079992
PLoS One. 2016 May 26;11(5):e0154567
pubmed: 27228152
New Phytol. 2014 Jan;201(2):574-584
pubmed: 24117540
Genome Res. 2009 Jan;19(1):57-69
pubmed: 18997003
Mol Cell. 2014 Aug 7;55(3):383-96
pubmed: 25018019
Plant Physiol. 2020 Mar;182(3):1359-1374
pubmed: 31882456
Nat Struct Mol Biol. 2012 Nov;19(11):1068-75
pubmed: 23132386
Plant J. 2018 Oct;96(1):188-202
pubmed: 29979827
Cell Mol Life Sci. 2016 Apr;73(7):1387-98
pubmed: 26748759
Sci Rep. 2021 Jan 8;11(1):212
pubmed: 33420191
Nucleic Acids Res. 2017 Jul 3;45(W1):W12-W16
pubmed: 28521017
Funct Integr Genomics. 2021 Mar;21(2):195-204
pubmed: 33635499
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nucleic Acids Res. 2016 Jan 4;44(D1):D1161-6
pubmed: 26578586
Nucleic Acids Res. 2013 Apr 1;41(6):e74
pubmed: 23335781
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Nucleic Acids Res. 2017 Dec 15;45(22):e183
pubmed: 29036354
Cell Mol Life Sci. 2019 Apr;76(8):1459-1471
pubmed: 30607432
Genes (Basel). 2020 Mar 17;11(3):
pubmed: 32192095
Genome Biol. 2014 Feb 27;15(2):R40
pubmed: 24576388
Plant Cell Environ. 2012 Mar;35(3):502-12
pubmed: 22017483
Mol Plant Microbe Interact. 2020 Apr;33(4):624-636
pubmed: 31868566
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
BMC Plant Biol. 2015 Jun 06;15:131
pubmed: 26048392
Mol Biosyst. 2015 Mar;11(3):892-7
pubmed: 25588719
Front Plant Sci. 2017 Jan 24;8:43
pubmed: 28174587
Mol Plant Microbe Interact. 2018 Feb;31(2):249-259
pubmed: 28990488