Comprehensive genome-wide identification of angiosperm upstream ORFs with peptide sequences conserved in various taxonomic ranges using a novel pipeline, ESUCA.
Bioinformatics
Nascent peptide
Translational regulation
Upstream ORF
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
30 Mar 2020
30 Mar 2020
Historique:
received:
23
07
2019
accepted:
10
03
2020
entrez:
2
4
2020
pubmed:
2
4
2020
medline:
22
12
2020
Statut:
epublish
Résumé
Upstream open reading frames (uORFs) in the 5'-untranslated regions (5'-UTRs) of certain eukaryotic mRNAs encode evolutionarily conserved functional peptides, such as cis-acting regulatory peptides that control translation of downstream main ORFs (mORFs). For genome-wide searches for uORFs with conserved peptide sequences (CPuORFs), comparative genomic studies have been conducted, in which uORF sequences were compared between selected species. To increase chances of identifying CPuORFs, we previously developed an approach in which uORF sequences were compared using BLAST between Arabidopsis and any other plant species with available transcript sequence databases. If this approach is applied to multiple plant species belonging to phylogenetically distant clades, it is expected to further comprehensively identify CPuORFs conserved in various plant lineages, including those conserved among relatively small taxonomic groups. To efficiently compare uORF sequences among many species and efficiently identify CPuORFs conserved in various taxonomic lineages, we developed a novel pipeline, ESUCA. We applied ESUCA to the genomes of five angiosperm species, which belong to phylogenetically distant clades, and selected CPuORFs conserved among at least three different orders. Through these analyses, we identified 89 novel CPuORF families. As expected, ESUCA analysis of each of the five angiosperm genomes identified many CPuORFs that were not identified from ESUCA analyses of the other four species. However, unexpectedly, these CPuORFs include those conserved across wide taxonomic ranges, indicating that the approach used here is useful not only for comprehensive identification of narrowly conserved CPuORFs but also for that of widely conserved CPuORFs. Examination of the effects of 11 selected CPuORFs on mORF translation revealed that CPuORFs conserved only in relatively narrow taxonomic ranges can have sequence-dependent regulatory effects, suggesting that most of the identified CPuORFs are conserved because of functional constraints of their encoded peptides. This study demonstrates that ESUCA is capable of efficiently identifying CPuORFs likely to be conserved because of the functional importance of their encoded peptides. Furthermore, our data show that the approach in which uORF sequences from multiple species are compared with those of many other species, using ESUCA, is highly effective in comprehensively identifying CPuORFs conserved in various taxonomic ranges.
Sections du résumé
BACKGROUND
BACKGROUND
Upstream open reading frames (uORFs) in the 5'-untranslated regions (5'-UTRs) of certain eukaryotic mRNAs encode evolutionarily conserved functional peptides, such as cis-acting regulatory peptides that control translation of downstream main ORFs (mORFs). For genome-wide searches for uORFs with conserved peptide sequences (CPuORFs), comparative genomic studies have been conducted, in which uORF sequences were compared between selected species. To increase chances of identifying CPuORFs, we previously developed an approach in which uORF sequences were compared using BLAST between Arabidopsis and any other plant species with available transcript sequence databases. If this approach is applied to multiple plant species belonging to phylogenetically distant clades, it is expected to further comprehensively identify CPuORFs conserved in various plant lineages, including those conserved among relatively small taxonomic groups.
RESULTS
RESULTS
To efficiently compare uORF sequences among many species and efficiently identify CPuORFs conserved in various taxonomic lineages, we developed a novel pipeline, ESUCA. We applied ESUCA to the genomes of five angiosperm species, which belong to phylogenetically distant clades, and selected CPuORFs conserved among at least three different orders. Through these analyses, we identified 89 novel CPuORF families. As expected, ESUCA analysis of each of the five angiosperm genomes identified many CPuORFs that were not identified from ESUCA analyses of the other four species. However, unexpectedly, these CPuORFs include those conserved across wide taxonomic ranges, indicating that the approach used here is useful not only for comprehensive identification of narrowly conserved CPuORFs but also for that of widely conserved CPuORFs. Examination of the effects of 11 selected CPuORFs on mORF translation revealed that CPuORFs conserved only in relatively narrow taxonomic ranges can have sequence-dependent regulatory effects, suggesting that most of the identified CPuORFs are conserved because of functional constraints of their encoded peptides.
CONCLUSIONS
CONCLUSIONS
This study demonstrates that ESUCA is capable of efficiently identifying CPuORFs likely to be conserved because of the functional importance of their encoded peptides. Furthermore, our data show that the approach in which uORF sequences from multiple species are compared with those of many other species, using ESUCA, is highly effective in comprehensively identifying CPuORFs conserved in various taxonomic ranges.
Identifiants
pubmed: 32228449
doi: 10.1186/s12864-020-6662-5
pii: 10.1186/s12864-020-6662-5
pmc: PMC7106846
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
260Subventions
Organisme : Japan Society for the Promotion of Science
ID : JP16H05063
Organisme : Japan Society for the Promotion of Science
ID : JP16K07387
Organisme : Japan Society for the Promotion of Science
ID : JP18H03330
Organisme : Japan Society for the Promotion of Science
ID : JP18H03330
Organisme : Japan Society for the Promotion of Science
ID : JP18K06297
Organisme : Japan Society for the Promotion of Science
ID : JP19K22892
Organisme : Ministry of Education, Culture, Sports, Science and Technology
ID : JP26113519
Organisme : Ministry of Education, Culture, Sports, Science and Technology
ID : JP17H05658
Organisme : Ministry of Education, Culture, Sports, Science and Technology
ID : JP26114703
Organisme : Ministry of Education, Culture, Sports, Science and Technology
ID : JP17H05659
Organisme : Ministry of Education, Culture, Sports, Science and Technology
ID : JP16H01246
Références
Biosci Biotechnol Biochem. 2006 Sep;70(9):2330-4
pubmed: 16960350
Nature. 2005 Dec 22;438(7071):1105-15
pubmed: 16372000
Mol Syst Biol. 2011 Oct 11;7:539
pubmed: 21988835
Plant Physiol. 2009 Jul;150(3):1356-67
pubmed: 19403731
Mol Cell Biol. 1997 Sep;17(9):4904-13
pubmed: 9271370
Nucleic Acids Res. 2005 Sep 26;33(17):5512-20
pubmed: 16186132
Mol Cell Biol. 2000 Dec;20(23):8635-42
pubmed: 11073965
Mol Cell. 2005 Nov 11;20(3):449-60
pubmed: 16285926
J Biol Chem. 1999 Dec 31;274(53):37565-74
pubmed: 10608810
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
pubmed: 9254694
BMC Genomics. 2006 Jan 26;7:16
pubmed: 16438715
PLoS Comput Biol. 2013;9(8):e1003118
pubmed: 23950696
Methods Mol Biol. 2017;1498:349-357
pubmed: 27709587
Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5
pubmed: 17130148
Mol Cell. 2010 Oct 8;40(1):138-46
pubmed: 20932481
Front Plant Sci. 2012 Aug 24;3:191
pubmed: 22936940
BMC Biol. 2007 Jul 30;5:32
pubmed: 17663791
Plant Cell Physiol. 2014 Sep;55(9):1556-67
pubmed: 24929422
J Mol Evol. 1993 Jan;36(1):96-9
pubmed: 8433381
Gene. 1989 Apr 15;77(1):51-9
pubmed: 2744487
Annu Rev Biochem. 2013;82:171-202
pubmed: 23746254
Bioinformatics. 2004 Feb 12;20(3):426-7
pubmed: 14960472
Genome Res. 2008 May;18(5):821-9
pubmed: 18349386
BMC Bioinformatics. 2010 May 27;11:284
pubmed: 20507581
Int J Biochem Cell Biol. 2013 Aug;45(8):1690-700
pubmed: 23624144
J Biol Chem. 2001 Oct 12;276(41):38036-43
pubmed: 11489903
Nucleic Acids Res. 2015 Feb 18;43(3):1562-76
pubmed: 25618853
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Biochem J. 2001 Jan 15;353(Pt 2):403-9
pubmed: 11139406
Curr Opin Microbiol. 2011 Apr;14(2):160-6
pubmed: 21342782
Genes Dev. 2008 Jun 1;22(11):1549-59
pubmed: 18519645
BMC Genomics. 2008 Feb 01;9:61
pubmed: 18237443
Development. 2006 Sep;133(18):3575-85
pubmed: 16936072
Nucleic Acids Res. 2005 Feb 16;33(3):955-65
pubmed: 15716313
Nucleic Acids Res. 2012 Jan;40(Database issue):D1178-86
pubmed: 22110026
Bioinformatics. 2012 Sep 1;28(17):2231-41
pubmed: 22618534
Plant Physiol. 2020 Jan;182(1):110-122
pubmed: 31451550
J Biol Chem. 2005 Nov 25;280(47):39229-37
pubmed: 16176926
J Exp Bot. 2012 Sep;63(14):5203-21
pubmed: 22791820
Plant Physiol. 2017 Nov;175(3):1238-1253
pubmed: 28956754
Bioinformatics. 2001 Oct;17(10):890-900
pubmed: 11673233
Plant Physiol Biochem. 2016 Nov;108:381-390
pubmed: 27526386
BMC Genomics. 2008 Jul 31;9:361
pubmed: 18667093
RNA. 2012 Mar;18(3):368-84
pubmed: 22237150
Glycobiology. 2007 Mar;17(3):345-54
pubmed: 17182701
Genome Biol. 2015 Aug 06;16:157
pubmed: 26243257
FEBS Lett. 2017 May;591(9):1266-1277
pubmed: 28369795
RNA. 2019 Mar;25(3):292-304
pubmed: 30567971
Plant Cell Physiol. 2004 Dec;45(12):1738-48
pubmed: 15653793
Nucleic Acids Res. 2017 Sep 6;45(15):8844-8858
pubmed: 28637336
Plant J. 2002 Apr;30(2):203-12
pubmed: 12000456