Whole-Genome Annotation with BRAKER.

AUGUSTUS BRAKER Gene prediction GeneMark-ES/ET Genome annotation pipeline Protein mapping to genome Protein-coding genes RNA-Seq reads

Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2019
Historique:
entrez: 26 4 2019
pubmed: 26 4 2019
medline: 10 8 2019
Statut: ppublish

Résumé

BRAKER is a pipeline for highly accurate and fully automated gene prediction in novel eukaryotic genomes. It combines two major tools: GeneMark-ES/ET and AUGUSTUS. GeneMark-ES/ET learns its parameters from a novel genomic sequence in a fully automated fashion; if available, it uses extrinsic evidence for model refinement. From the protein-coding genes predicted by GeneMark-ES/ET, we select a set for training AUGUSTUS, one of the most accurate gene finding tools that, in contrast to GeneMark-ES/ET, integrates extrinsic evidence already into the gene prediction step. The first published version, BRAKER1, integrated genomic footprints of unassembled RNA-Seq reads into the training as well as into the prediction steps. The pipeline has since been extended to the integration of data on mapped cross-species proteins, and to the usage of heterogeneous extrinsic evidence, both RNA-Seq and protein alignments. In this book chapter, we briefly summarize the pipeline methodology and describe how to apply BRAKER in environments characterized by various combinations of external evidence.

Identifiants

pubmed: 31020555
doi: 10.1007/978-1-4939-9173-0_5
pmc: PMC6635606
mid: NIHMS1031243
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Pagination

65-95

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM128145
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG000783
Pays : United States

Références

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W435-9
pubmed: 16845043
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W123-8
pubmed: 23700307
Bioinformatics. 2012 Feb 15;28(4):464-9
pubmed: 22199388
Nucleic Acids Res. 2016 May 19;44(9):e89
pubmed: 26893356
Nat Methods. 2013 Dec;10(12):1177-84
pubmed: 24185837
BMC Bioinformatics. 2005 Feb 15;6:31
pubmed: 15713233
Nucleic Acids Res. 2005 Nov 28;33(20):6494-506
pubmed: 16314312
Nucleic Acids Res. 2003 Oct 1;31(19):5654-66
pubmed: 14500829
Nucleic Acids Res. 2014 Sep;42(15):e119
pubmed: 24990371
Genome Biol. 2013 Apr 25;14(4):R36
pubmed: 23618408
Genome Biol. 2006;7 Suppl 1:S2.1-31
pubmed: 16925836
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
BMC Bioinformatics. 2008 Jun 13;9:278
pubmed: 18554390
BMC Bioinformatics. 2008 Dec 19;9:549
pubmed: 19099578
Nature. 2005 May 12;435(7039):134
pubmed: 15889055
Bioinformatics. 2008 Mar 1;24(5):637-44
pubmed: 18218656
Nucleic Acids Res. 2013 Jan;41(Database issue):D358-65
pubmed: 23180791
BMC Bioinformatics. 2011 Dec 22;12:491
pubmed: 22192575
Biol Direct. 2008 May 21;3:20
pubmed: 18495041
Bioinformatics. 2005 Jun;21 Suppl 1:i351-8
pubmed: 15961478
BMC Bioinformatics. 2018 May 30;19(1):189
pubmed: 29843602
Nucleic Acids Res. 2012 Nov 1;40(20):e161
pubmed: 22848105
Genome Res. 2008 Jan;18(1):188-96
pubmed: 18025269
Bioinformatics. 2010 Apr 1;26(7):873-81
pubmed: 20147302
Bioinformatics. 2008 Nov 1;24(21):2438-44
pubmed: 18728043
Nucleic Acids Res. 2018 Jan 4;46(D1):D762-D769
pubmed: 29106570
BMC Bioinformatics. 2006 Feb 09;7:62
pubmed: 16469098
Genome Res. 2009 Sep;19(9):1630-8
pubmed: 19570905
Bioinformatics. 2016 Nov 15;32(22):3388-3395
pubmed: 27466621
Curr Protoc Bioinformatics. 2009 Mar;Chapter 4:4.10.1-4.10.14
pubmed: 19274634
Genome Res. 2008 Dec;18(12):1979-90
pubmed: 18757608
Bioinformatics. 2011 Jun 15;27(12):1691-2
pubmed: 21493652
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Nucleic Acids Res. 2012 Jan;40(Database issue):D284-9
pubmed: 22096231
Bioinformatics. 2016 Mar 1;32(5):767-9
pubmed: 26559507
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12
pubmed: 15215400
Nucleic Acids Res. 2008 May;36(8):2630-8
pubmed: 18344523
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Genome Biol. 2006;7 Suppl 1:S11.1-8
pubmed: 16925833

Auteurs

Katharina J Hoff (KJ)

University of Greifswald, Institute of Mathematics and Computer Science, Greifswald, Germany. katharina.hoff@uni-greifswald.de.

Alexandre Lomsadze (A)

Joint Georgia Tech and Emory University Wallace H Coulter Department of Biomedical Engineering, Atlanta, GA, USA.

Mark Borodovsky (M)

Joint Georgia Tech and Emory University Wallace H Coulter Department of Biomedical Engineering, Atlanta, GA, USA. borodovsky@gatech.edu.
School of Computational Science and Engineering, Atlanta, GA, 30332, USA. borodovsky@gatech.edu.
Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia. borodovsky@gatech.edu.

Mario Stanke (M)

University of Greifswald, Institute of Mathematics and Computer Science, Greifswald, Germany.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Humans Adult Male Female Video Games
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH