Three-nucleotide periodicity of nucleotide diversity in a population enables the identification of open reading frames.
SNPs
open reading frame
polyploidy genome
population
sORF
Journal
Briefings in bioinformatics
ISSN: 1477-4054
Titre abrégé: Brief Bioinform
Pays: England
ID NLM: 100912837
Informations de publication
Date de publication:
18 07 2022
18 07 2022
Historique:
received:
24
01
2022
revised:
25
04
2022
accepted:
06
05
2022
pubmed:
15
6
2022
medline:
22
7
2022
entrez:
14
6
2022
Statut:
ppublish
Résumé
Accurate prediction of open reading frames (ORFs) is important for studying and using genome sequences. Ribosomes move along mRNA strands with a step of three nucleotides and datasets carrying this information can be used to predict ORFs. The ribosome-protected footprints (RPFs) feature a significant 3-nt periodicity on mRNAs and are powerful in predicting translating ORFs, including small ORFs (sORFs), but the application of RPFs is limited because they are too short to be accurately mapped in complex genomes. In this study, we found a significant 3-nt periodicity in the datasets of populational genomic variants in coding sequences, in which the nucleotide diversity increases every three nucleotides. We suggest that this feature can be used to predict ORFs and develop the Python package 'OrfPP', which recovers ~83% of the annotated ORFs in the tested genomes on average, independent of the population sizes and the complexity of the genomes. The novel ORFs, including sORFs, identified from single-nucleotide polymorphisms are supported by protein mass spectrometry evidence comparable to that of the annotated ORFs. The application of OrfPP to tetraploid cotton and hexaploid wheat genomes successfully identified 76.17% and 87.43% of the annotated ORFs in the genomes, respectively, as well as 4704 sORFs, including 1182 upstream and 2110 downstream ORFs in cotton and 5025 sORFs, including 232 upstream and 234 downstream ORFs in wheat. Overall, we propose an alternative and supplementary approach for ORF prediction that can extend the studies of sORFs to more complex genomes.
Identifiants
pubmed: 35698834
pii: 6607611
doi: 10.1093/bib/bbac210
pmc: PMC9294425
pii:
doi:
Substances chimiques
RNA, Messenger
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press.
Références
Nature. 2020 Aug;584(7822):602-607
pubmed: 32641831
Mol Biol Evol. 1985 Jan;2(1):13-34
pubmed: 3916708
Nature. 2018 May;557(7703):43-49
pubmed: 29695866
Elife. 2014 Aug 21;3:e03528
pubmed: 25144939
Nucleic Acids Res. 2018 Oct 12;46(18):e109
pubmed: 29945224
Proc Natl Acad Sci U S A. 2012 Sep 11;109(37):E2424-32
pubmed: 22927429
Proc Natl Acad Sci U S A. 2017 Nov 14;114(46):E10018-E10027
pubmed: 29087317
Nat Plants. 2021 Dec;7(12):1571-1578
pubmed: 34845350
Nucleic Acids Res. 2020 Feb 20;48(3):1239-1253
pubmed: 31822915
Nat Methods. 2018 May;15(5):363-366
pubmed: 29529017
Elife. 2016 Dec 12;5:
pubmed: 27938667
Genome Biol. 2019 Jul 12;20(1):136
pubmed: 31300020
Nature. 2012 Nov 1;491(7422):56-65
pubmed: 23128226
Nat Methods. 2016 Feb;13(2):165-70
pubmed: 26657557
Cell. 2016 Jul 14;166(2):481-491
pubmed: 27293186
Proc Natl Acad Sci U S A. 2014 Jan 7;111(1):E203-12
pubmed: 24367078
J Mol Evol. 2001 Oct-Nov;53(4-5):290-8
pubmed: 11675589
Nature. 2010 Oct 28;467(7319):1061-73
pubmed: 20981092
Nat Commun. 2020 Jun 4;11(1):2815
pubmed: 32499537
Nucleic Acids Res. 2018 Jun 1;46(10):e61
pubmed: 29538776
Nat Protoc. 2012 Jul 26;7(8):1534-50
pubmed: 22836135
Nucleic Acids Res. 2017 Jan 25;45(2):513-526
pubmed: 27923997
Trends Genet. 2002 Sep;18(9):486
pubmed: 12175810
Nat Struct Mol Biol. 2020 Aug;27(8):717-725
pubmed: 32601440
BMC Biol. 2007 Jul 30;5:32
pubmed: 17663791
Front Plant Sci. 2021 Jan 13;11:600278
pubmed: 33519854
Cell. 2011 Nov 11;147(4):789-802
pubmed: 22056041
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Cell. 2016 Feb 11;164(4):757-69
pubmed: 26871635
Nat Methods. 2015 Feb;12(2):147-53
pubmed: 25486063
EMBO J. 2020 Sep 1;39(17):e104763
pubmed: 32744758
Genome Biol. 2021 Apr 23;22(1):119
pubmed: 33892774
Nat Biotechnol. 2008 Dec;26(12):1367-72
pubmed: 19029910
BMC Plant Biol. 2020 Jul 11;20(1):328
pubmed: 32652934
Open Biol. 2021 Nov;11(11):210190
pubmed: 34753322
Brief Bioinform. 2019 Jan 18;20(1):144-155
pubmed: 28968766
Bioinformatics. 2020 Apr 1;36(7):2053-2059
pubmed: 31750902
BMC Genomics. 2021 Aug 12;22(1):612
pubmed: 34384368
Nat Genet. 2020 Dec;52(12):1412-1422
pubmed: 33106631
Plant Cell. 2018 Sep;30(9):2137-2160
pubmed: 30087207
Nat Commun. 2021 Dec 13;12(1):7246
pubmed: 34903739
Gigascience. 2019 Oct 1;8(10):
pubmed: 31574156
Life (Basel). 2021 Jul 16;11(7):
pubmed: 34357073
Genome Res. 2018 Feb;28(2):214-222
pubmed: 29254944
Bioinformatics. 2011 Aug 1;27(15):2156-8
pubmed: 21653522
Hortic Res. 2020 Jun 1;7(1):85
pubmed: 32528697
Methods. 2015 Dec;91:69-74
pubmed: 26164698
Nat Commun. 2017 Jan 24;8:14061
pubmed: 28117401
Proc Natl Acad Sci U S A. 2016 Nov 8;113(45):E7126-E7135
pubmed: 27791167
Trends Genet. 2017 Oct;33(10):728-744
pubmed: 28887026
Nucleic Acids Res. 2020 Oct 9;48(18):10441-10455
pubmed: 32941651
Gigascience. 2018 Apr 1;7(4):
pubmed: 29635409
Nat Commun. 2017 Aug 15;8(1):249
pubmed: 28811498
Int J Mol Sci. 2020 Aug 19;21(17):
pubmed: 32825202
Exp Cell Res. 2020 Jul 1;392(1):111997
pubmed: 32302626
Science. 2019 Sep 20;365(6459):1291-1295
pubmed: 31604238
Trends Plant Sci. 2022 Apr;27(4):391-401
pubmed: 34782248
Proc Natl Acad Sci U S A. 2021 Oct 5;118(40):
pubmed: 34593629