Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.


Journal

DNA research : an international journal for rapid publication of reports on genes and genomes
ISSN: 1756-1663
Titre abrégé: DNA Res
Pays: England
ID NLM: 9423827

Informations de publication

Date de publication:
01 Apr 2019
Historique:
received: 27 04 2018
accepted: 07 12 2018
pubmed: 7 2 2019
medline: 6 8 2019
entrez: 7 2 2019
Statut: ppublish

Résumé

A new mathematical method for potential reading frameshift detection in protein-coding sequences (cds) was developed. The algorithm is adjusted to the triplet periodicity of each analysed sequence using dynamic programming and a genetic algorithm. This does not require any preliminary training. Using the developed method, cds from the Arabidopsis thaliana genome were analysed. In total, the algorithm found 9,930 sequences containing one or more potential reading frameshift(s). This is ∼21% of all analysed sequences of the genome. The Type I and Type II error rates were estimated as 11% and 30%, respectively. Similar results were obtained for the genomes of Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Rattus norvegicus and Xenopus tropicalis. Also, the developed algorithm was tested on 17 bacterial genomes. We compared our results with the previously obtained data on the search for potential reading frameshifts in these genomes. This study discussed the possibility that the reading frameshift seems like a relatively frequently encountered mutation; and this mutation could participate in the creation of new genes and proteins.

Identifiants

pubmed: 30726896
pii: 5306610
doi: 10.1093/dnares/dsy046
pmc: PMC6476729
doi:

Types de publication

Journal Article

Langues

eng

Pagination

157-170

Informations de copyright

© The Author(s) 2019. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

Références

Nucleic Acids Res. 2015 Jan;43(Database issue):D662-9
pubmed: 25352552
Bioinformatics. 2009 Mar 1;25(5):670-1
pubmed: 19153134
Annu Rev Genet. 2005;39:309-38
pubmed: 16285863
Bioinformatics. 2001 Jan;17(1):13-5
pubmed: 11222258
Am J Hum Genet. 1991 Feb;48(2):227-31
pubmed: 1990834
Nature. 1978 Feb 9;271(5645):501
pubmed: 622185
Genome Res. 2000 Nov;10(11):1743-56
pubmed: 11076860
Nucleic Acids Res. 2013 Jan;41(Database issue):D152-6
pubmed: 23161689
Bioinformatics. 2014 Dec 15;30(24):3575-82
pubmed: 25172925
Nucleic Acids Res. 2010 Nov;38(20):e191
pubmed: 20805240
J Theor Biol. 1995 Aug 21;175(4):477-94
pubmed: 7475085
Mol Biol (Mosk). 2003 Jul-Aug;37(4):663-73
pubmed: 12942640
J Integr Bioinform. 2010 Mar 25;7(3):
pubmed: 20375465
Mol Biol (Mosk). 2008 Jul-Aug;42(4):707-20
pubmed: 18856072
BMC Bioinformatics. 2011 May 24;12:198
pubmed: 21609463
Nature. 2001 May 31;411(6837):603-6
pubmed: 11385577
J Theor Biol. 1994 Apr 21;167(4):413-4
pubmed: 8207954
Nucleic Acids Res. 2003 Jul 1;31(13):3738-41
pubmed: 12824407
Curr Opin Genet Dev. 2001 Dec;11(6):616-9
pubmed: 11682303
DNA Res. 2009 Apr;16(2):105-14
pubmed: 19261626
Brief Bioinform. 2004 Jun;5(2):118-30
pubmed: 15260893
Stat Appl Genet Mol Biol. 2016 Oct 1;15(5):381-400
pubmed: 27337743
J Mol Biol. 1970 Mar;48(3):443-53
pubmed: 5420325
J Theor Biol. 2007 Aug 21;247(4):687-94
pubmed: 17509616
Genomics Proteomics Bioinformatics. 2011 Oct;9(4-5):158-70
pubmed: 22196359
Genomics. 2006 Dec;88(6):690-697
pubmed: 16890400
Nucleic Acids Res. 2013 Jul;41(13):6514-30
pubmed: 23649834
J Biol Chem. 1995 Feb 10;270(6):2411-4
pubmed: 7852296
Gene. 2008 Sep 15;421(1-2):52-60
pubmed: 18593596
BMC Bioinformatics. 2010 Nov 08;11:550
pubmed: 21059240
Bioinformatics. 2016 Sep 1;32(17):i529-i537
pubmed: 27587671
J Bioinform Comput Biol. 2010 Jun;8(3):535-51
pubmed: 20556861
Gene. 2013 May 1;519(2):343-7
pubmed: 23434521
J Biomed Biotechnol. 2005 Jun 30;2005(2):139-46
pubmed: 16046819
Trends Genet. 2005 Aug;21(8):428-31
pubmed: 15951050
Cardiol Young. 2011 Jun;21(3):345-8
pubmed: 21262074
Genome Biol. 2005;6(7):R58
pubmed: 15998447
Front Genet. 2012 Nov 19;3:242
pubmed: 23181069
Algorithms Mol Biol. 2010 Jan 04;5(1):6
pubmed: 20047662

Auteurs

Y M Suvorova (YM)

Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia.

M A Korotkova (MA)

National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russia.

K G Skryabin (KG)

Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia.

E V Korotkov (EV)

Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia.
National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russia.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH