BrumiR: A toolkit for de novo discovery of microRNAs from sRNA-seq data.
Algorithms
de novo
miRNA
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
25 10 2022
25 10 2022
Historique:
received:
27
08
2020
revised:
08
11
2021
accepted:
15
09
2022
entrez:
25
10
2022
pubmed:
26
10
2022
medline:
28
10
2022
Statut:
ppublish
Résumé
MicroRNAs (miRNAs) are small noncoding RNAs that are key players in the regulation of gene expression. In the past decade, with the increasing accessibility of high-throughput sequencing technologies, different methods have been developed to identify miRNAs, most of which rely on preexisting reference genomes. However, when a reference genome is absent or is not of high quality, such identification becomes more difficult. In this context, we developed BrumiR, an algorithm that is able to discover miRNAs directly and exclusively from small RNA (sRNA) sequencing (sRNA-seq) data. We benchmarked BrumiR with datasets encompassing animal and plant species using real and simulated sRNA-seq experiments. The results demonstrate that BrumiR reaches the highest recall for miRNA discovery, while at the same time being much faster and more efficient than the state-of-the-art tools evaluated. The latter allows BrumiR to analyze a large number of sRNA-seq experiments, from plants or animal species. Moreover, BrumiR detects additional information regarding other expressed sequences (sRNAs, isomiRs, etc.), thus maximizing the biological insight gained from sRNA-seq experiments. Additionally, when a reference genome is available, BrumiR provides a new mapping tool (BrumiR2reference) that performs an a posteriori exhaustive search to identify the precursor sequences. Finally, we also provide a machine learning classifier based on a random forest model that evaluates the sequence-derived features to further refine the prediction obtained from the BrumiR-core. The code of BrumiR and all the algorithms that compose the BrumiR toolkit are freely available at https://github.com/camoragaq/BrumiR.
Identifiants
pubmed: 36283679
pii: 6773084
doi: 10.1093/gigascience/giac093
pmc: PMC9596168
pii:
doi:
Substances chimiques
MicroRNAs
0
RNA, Small Untranslated
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press GigaScience.
Références
EMBO J. 2002 Sep 2;21(17):4663-70
pubmed: 12198168
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Genome Res. 2017 Feb;27(2):234-245
pubmed: 28148562
Cell. 2018 Mar 22;173(1):20-51
pubmed: 29570994
J Exp Bot. 2010;61(1):165-77
pubmed: 19815687
BMC Bioinformatics. 2013 May 16;14:160
pubmed: 23679007
Nucleic Acids Res. 2018 Jun 20;46(11):5381-5394
pubmed: 29746666
Bioinformatics. 2016 Jun 15;32(12):i201-i208
pubmed: 27307618
Nat Struct Mol Biol. 2006 Dec;13(12):1097-101
pubmed: 17099701
Bioinformatics. 2015 Oct 15;31(20):3350-2
pubmed: 26099265
Bioinformatics. 2017 Sep 01;33(17):2759-2761
pubmed: 28472236
Brief Bioinform. 2019 Sep 27;20(5):1836-1852
pubmed: 29982332
PLoS One. 2007 Feb 14;2(2):e219
pubmed: 17299599
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Nat Biotechnol. 2018 Dec 6;36(12):1121
pubmed: 30520871
BMC Bioinformatics. 2015 May 29;16:179
pubmed: 26022464
Plant Cell. 2008 Dec;20(12):3186-90
pubmed: 19074682
Algorithms Mol Biol. 2011 Nov 24;6:26
pubmed: 22115189
Nucleic Acids Res. 2014 Jan;42(Database issue):D68-73
pubmed: 24275495
PLoS One. 2013 Jun 21;8(6):e66857
pubmed: 23805282
Genome Biol. 2009;10(3):R25
pubmed: 19261174
Bioinformatics. 2017 May 1;33(9):1394-1395
pubmed: 28453688
New Phytol. 2016 Oct;212(1):22-35
pubmed: 27292927
RNA. 2016 Aug;22(8):1120-38
pubmed: 27284164
Nucleic Acids Res. 2012 Jan;40(1):37-52
pubmed: 21911355
Nat Biotechnol. 2008 Apr;26(4):407-15
pubmed: 18392026
J Cell Physiol. 2016 Jan;231(1):25-30
pubmed: 26031493
Genome Res. 2008 Apr;18(4):610-21
pubmed: 18285502
Genome Res. 2017 Mar;27(3):374-384
pubmed: 28087842
Bioinformatics. 2018 Sep 1;34(17):i884-i890
pubmed: 30423086
Bioinformatics. 2016 Jun 1;32(11):1670-7
pubmed: 27153653
Bioinformatics. 2014 Oct;30(19):2837-9
pubmed: 24930140
Science. 2001 Oct 26;294(5543):853-8
pubmed: 11679670
Algorithms Mol Biol. 2013 Sep 16;8(1):22
pubmed: 24040893
Genome Res. 2008 May;18(5):821-9
pubmed: 18349386
J Exp Bot. 2015 Feb;66(4):1099-112
pubmed: 25628329
Proteins. 2001 Jan 1;42(1):38-48
pubmed: 11093259
Science. 2001 Oct 26;294(5543):858-62
pubmed: 11679671
Genes Dev. 2002 Jul 1;16(13):1616-26
pubmed: 12101121
Nucleic Acids Res. 2020 Jan 8;48(D1):D132-D141
pubmed: 31598695
Plant Physiol. 2009 Dec;151(4):2120-32
pubmed: 19854858
Signal Transduct Target Ther. 2016 Jan 28;1:15004
pubmed: 29263891
Nucleic Acids Res. 2018 Jul 2;46(W1):W49-W54
pubmed: 29718424
Noncoding RNA. 2019 Feb 04;5(1):
pubmed: 30720712
Plant Cell. 2018 Feb;30(2):272-284
pubmed: 29343505
Gigascience. 2022 Oct 25;11:
pubmed: 36283679
Cell. 2009 Jan 23;136(2):215-33
pubmed: 19167326
EMBO J. 2004 Oct 13;23(20):4051-60
pubmed: 15372072
New Phytol. 2017 Feb;213(3):1052-1067
pubmed: 27801942
Cell. 2003 Oct 17;115(2):209-16
pubmed: 14567918
Nature. 2003 Sep 25;425(6956):415-9
pubmed: 14508493
Nucleic Acids Res. 2018 Jan 4;46(D1):D335-D342
pubmed: 29112718
Cell. 2003 Oct 17;115(2):199-208
pubmed: 14567917
DNA Res. 2016 Oct 1;23(5):415-425
pubmed: 27374612
Nat Biotechnol. 2011 Nov 08;29(11):987-91
pubmed: 22068540
RNA. 2004 Dec;10(12):1957-66
pubmed: 15525708
Brief Bioinform. 2019 May 21;20(3):918-930
pubmed: 29126230
Cell. 2004 Jan 23;116(2):281-97
pubmed: 14744438
Nucleic Acids Res. 2017 Dec 1;45(21):e177
pubmed: 29036314
Nucleic Acids Res. 2012 May;40(10):4298-305
pubmed: 22287634
J Exp Zool B Mol Dev Evol. 2013 Jan;320(1):47-56
pubmed: 23184675
Curr Protoc Bioinformatics. 2018 Jun;62(1):e51
pubmed: 29927072
Front Mol Biosci. 2017 Jun 06;4:38
pubmed: 28634583
Genome Res. 2012 Jan;22(1):163-76
pubmed: 21940835