BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.


Journal

NAR genomics and bioinformatics
ISSN: 2631-9268
Titre abrégé: NAR Genom Bioinform
Pays: England
ID NLM: 101756213

Informations de publication

Date de publication:
Mar 2021
Historique:
received: 10 08 2020
revised: 26 11 2020
accepted: 20 12 2020
entrez: 12 2 2021
pubmed: 13 2 2021
medline: 13 2 2021
Statut: epublish

Résumé

The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.

Identifiants

pubmed: 33575650
doi: 10.1093/nargab/lqaa108
pii: lqaa108
pmc: PMC7787252
doi:

Types de publication

Journal Article

Langues

eng

Pagination

lqaa108

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM128145
Pays : United States

Informations de copyright

© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

Références

Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811
pubmed: 30395283
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W435-9
pubmed: 16845043
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W123-8
pubmed: 23700307
Bioinformatics. 2008 Mar 1;24(5):637-44
pubmed: 18218656
Nucleic Acids Res. 2016 May 19;44(9):e89
pubmed: 26893356
BMC Bioinformatics. 2011 Dec 22;12:491
pubmed: 22192575
PLoS Biol. 2017 Jul 27;15(7):e2002266
pubmed: 28749982
Sci Rep. 2017 Oct 2;7(1):12508
pubmed: 28970504
PLoS One. 2011;6(7):e22728
pubmed: 21829493
Genome Biol. 2006;7 Suppl 1:S2.1-31
pubmed: 16925836
Proc Natl Acad Sci U S A. 2018 Apr 24;115(17):4325-4333
pubmed: 29686065
Nat Methods. 2013 Dec;10(12):1177-84
pubmed: 24185837
Proc Natl Acad Sci U S A. 1996 Aug 20;93(17):9061-6
pubmed: 8799154
Curr Protoc Bioinformatics. 2014 Dec 12;48:4.11.1-39
pubmed: 25501943
Genome Res. 2004 May;14(5):988-95
pubmed: 15123596
BMC Bioinformatics. 2018 May 30;19(1):189
pubmed: 29843602
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Nucleic Acids Res. 2005 Nov 28;33(20):6494-506
pubmed: 16314312
Nucleic Acids Res. 1999 Jan 15;27(2):573-80
pubmed: 9862982
Nat Plants. 2017 Mar 27;3:17038
pubmed: 28346448
Cell. 2017 Oct 5;171(2):287-304.e15
pubmed: 28985561
Methods Mol Biol. 2019;1962:97-120
pubmed: 31020556
Genome Res. 2008 Jan;18(1):188-96
pubmed: 18025269
Nat Commun. 2013;4:2325
pubmed: 23942320
Bioinformatics. 2007 May 1;23(9):1061-7
pubmed: 17332020
Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21
pubmed: 21062823
Methods Mol Biol. 2019;1962:227-245
pubmed: 31020564
Bioinformatics. 2008 Nov 1;24(21):2438-44
pubmed: 18728043
PLoS Comput Biol. 2011 Oct;7(10):e1002195
pubmed: 22039361
Nat Commun. 2018 Dec 17;9(1):5346
pubmed: 30559369
BMC Bioinformatics. 2006 Feb 09;7:62
pubmed: 16469098
BMC Bioinformatics. 2019 Nov 8;20(1):558
pubmed: 31703556
Bioinformatics. 2016 Nov 15;32(22):3388-3395
pubmed: 27466621
NAR Genom Bioinform. 2020 Jun;2(2):lqaa026
pubmed: 32440658
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Nucleic Acids Res. 2014 Sep;42(15):e119
pubmed: 24990371
Comput Appl Biosci. 1996 Jun;12(3):161-70
pubmed: 8872383
Nucleic Acids Res. 2011 Jan;39(Database issue):D38-51
pubmed: 21097890
Nat Genet. 2013 Oct;45(10):1168-75
pubmed: 24013640
Bioinformatics. 2016 Mar 1;32(5):767-9
pubmed: 26559507
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12
pubmed: 15215400
Plant Physiol. 2006 Nov;142(3):1039-52
pubmed: 17012407
Nat Genet. 2017 Apr;49(4):643-650
pubmed: 28263316
BMC Bioinformatics. 2008 Jun 13;9:278
pubmed: 18554390
Mol Biol Evol. 2018 Mar 1;35(3):543-548
pubmed: 29220515
Nat Genet. 2011 Feb;43(2):109-16
pubmed: 21186353
Cell. 2011 Nov 23;147(5):1171-85
pubmed: 22118469
BMC Bioinformatics. 2004 May 14;5:59
pubmed: 15144565
BMC Bioinformatics. 2008 Dec 19;9:549
pubmed: 19099578

Auteurs

Tomáš Brůna (T)

School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Katharina J Hoff (KJ)

Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany.

Alexandre Lomsadze (A)

Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Mario Stanke (M)

Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany.

Mark Borodovsky (M)

Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Classifications MeSH