SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data.
5′ UTR
Chimeric reads
Eukaryotic operons
Genome annotation
Polycistronic RNA processing
RNA-seq
Spliced-leader trans-splicing
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
22 Mar 2021
22 Mar 2021
Historique:
received:
18
12
2020
accepted:
08
02
2021
entrez:
23
3
2021
pubmed:
24
3
2021
medline:
13
4
2021
Statut:
epublish
Résumé
Spliced leader (SL) trans-splicing replaces the 5' end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5' read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5' SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available.
Sections du résumé
BACKGROUND
BACKGROUND
Spliced leader (SL) trans-splicing replaces the 5' end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes.
RESULTS
RESULTS
Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5' read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5' SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons.
CONCLUSIONS
CONCLUSIONS
SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available.
Identifiants
pubmed: 33752599
doi: 10.1186/s12859-021-04009-7
pii: 10.1186/s12859-021-04009-7
pmc: PMC7986045
doi:
Substances chimiques
RNA, Spliced Leader
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
140Subventions
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/J007137/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/T002859/1
Pays : United Kingdom
Références
Nat Biotechnol. 2010 May;28(5):511-5
pubmed: 20436464
Nature. 2002 Jun 20;417(6891):851-4
pubmed: 12075352
Genetics. 2014 Aug;197(4):1201-11
pubmed: 24931407
Int J Parasitol Drugs Drug Resist. 2019 Aug;10:28-37
pubmed: 31015150
BMC Biol. 2020 Nov 9;18(1):165
pubmed: 33167983
Gigascience. 2018 Jul 1;7(7):
pubmed: 30010768
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
PLoS Biol. 2003 Nov;1(2):E45
pubmed: 14624247
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Brief Funct Genomic Proteomic. 2004 Nov;3(3):199-211
pubmed: 15642184
PLoS One. 2018 Jul 19;13(7):e0200961
pubmed: 30024971
Front Genet. 2013 Oct 11;4:199
pubmed: 24130571
PLoS Genet. 2006 Nov 24;2(11):e198
pubmed: 17121468
Front Genet. 2018 Dec 18;9:671
pubmed: 30619487
RNA. 2008 Apr;14(4):760-70
pubmed: 18256244
Trends Genet. 1995 Apr;11(4):132-6
pubmed: 7732590
Nucleic Acids Res. 2006 Jul 05;34(11):3378-88
pubmed: 16822859
Nat Methods. 2018 Mar;15(3):201-206
pubmed: 29334379
Bioinformatics. 2014 Apr 1;30(7):923-30
pubmed: 24227677
Microbiome. 2015 May 20;3:20
pubmed: 25995836
Proc Natl Acad Sci U S A. 1990 Nov;87(22):8879-83
pubmed: 2247461
Algorithms Mol Biol. 2011 Nov 24;6:26
pubmed: 22115189
PLoS One. 2009;4(1):e4129
pubmed: 19122814
Genomics. 2010 Nov;96(5):259-65
pubmed: 20688152
BMC Evol Biol. 2019 Jun 13;19(1):121
pubmed: 31195978
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Genome Biol. 2008 Oct 14;9(10):R152
pubmed: 18854010
Genome Res. 2011 Feb;21(2):255-64
pubmed: 21177958
RNA. 2010 Apr;16(4):696-707
pubmed: 20142326
Sci Rep. 2018 Mar 1;8(1):3877
pubmed: 29497070
Genome Res. 2010 Jun;20(6):837-46
pubmed: 20237107
Genome Biol. 2019 Dec 23;20(1):295
pubmed: 31870412
RNA. 2010 Aug;16(8):1500-7
pubmed: 20566669
Trends Genet. 2001 Dec;17(12):678-80
pubmed: 11718904
Genes Dev. 2001 Feb 1;15(3):294-303
pubmed: 11159910
Worm. 2014 May 16;3:e29158
pubmed: 25254153
Nat Commun. 2019 Feb 14;10(1):754
pubmed: 30765700
Cell. 1986 Nov 21;47(4):527-35
pubmed: 3022935
Nature. 2010 Mar 25;464(7288):592-6
pubmed: 20228792
RNA. 2014 Sep;20(9):1386-97
pubmed: 25015138
Cell. 1988 Aug 12;54(4):533-9
pubmed: 3401926
Sci Rep. 2017 Jun 16;7(1):3725
pubmed: 28623350
Nucleic Acids Res. 1989 Nov 11;17(21):8657-67
pubmed: 2587214
RNA. 2020 Dec;26(12):1891-1904
pubmed: 32887788
Cell. 1993 May 7;73(3):521-32
pubmed: 8098272
Curr Biol. 2006 Jan 10;16(1):R8-9
pubmed: 16401417
BMC Bioinformatics. 2020 Jul 8;21(1):293
pubmed: 32640978
Genome Biol. 2008;9(6):R94
pubmed: 18533022
Trends Genet. 2005 Apr;21(4):240-7
pubmed: 15797620
Science. 2010 Dec 3;330(6009):1381-5
pubmed: 21097902
Mol Cell Biol. 2000 Sep;20(18):6659-67
pubmed: 10958663
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Proc Natl Acad Sci U S A. 2001 May 8;98(10):5693-8
pubmed: 11331766
Nucleic Acids Res. 2010 Jul;38(12):e131
pubmed: 20395217
Int J Med Microbiol. 2012 Oct;302(4-5):221-4
pubmed: 22964417
Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4618-23
pubmed: 17360573
PeerJ. 2016 Oct 18;4:e2584
pubmed: 27781170
Genome Res. 2017 Dec;27(12):2120-2128
pubmed: 29089372
Genome Res. 2012 Aug;22(8):1567-80
pubmed: 22772596
Mol Cell Biol. 1999 Jan;19(1):376-83
pubmed: 9858561
Mol Cell Biol. 2004 Sep;24(17):7795-805
pubmed: 15314184
Genome Biol Evol. 2017 Mar 1;9(3):468-473
pubmed: 28391323
J Mol Evol. 2017 Aug;85(1-2):37-45
pubmed: 28744787
Mol Biol Evol. 2003 Dec;20(12):2097-103
pubmed: 12949121
Mol Biol Evol. 2010 Mar;27(3):684-93
pubmed: 19942614
Mol Biol Evol. 2015 Mar;32(3):585-99
pubmed: 25525214
Wiley Interdiscip Rev RNA. 2011 May-Jun;2(3):417-34
pubmed: 21957027
Zoolog Sci. 2010 Feb;27(2):171-80
pubmed: 20141422
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Genome Res. 2010 May;20(5):636-45
pubmed: 20212022
Noncoding RNA. 2020 Aug 21;6(3):
pubmed: 32825772