BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.
Journal
NAR genomics and bioinformatics
ISSN: 2631-9268
Titre abrégé: NAR Genom Bioinform
Pays: England
ID NLM: 101756213
Informations de publication
Date de publication:
Mar 2021
Mar 2021
Historique:
received:
10
08
2020
revised:
26
11
2020
accepted:
20
12
2020
entrez:
12
2
2021
pubmed:
13
2
2021
medline:
13
2
2021
Statut:
epublish
Résumé
The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.
Identifiants
pubmed: 33575650
doi: 10.1093/nargab/lqaa108
pii: lqaa108
pmc: PMC7787252
doi:
Types de publication
Journal Article
Langues
eng
Pagination
lqaa108Subventions
Organisme : NIGMS NIH HHS
ID : R01 GM128145
Pays : United States
Informations de copyright
© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
Références
Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811
pubmed: 30395283
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W435-9
pubmed: 16845043
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W123-8
pubmed: 23700307
Bioinformatics. 2008 Mar 1;24(5):637-44
pubmed: 18218656
Nucleic Acids Res. 2016 May 19;44(9):e89
pubmed: 26893356
BMC Bioinformatics. 2011 Dec 22;12:491
pubmed: 22192575
PLoS Biol. 2017 Jul 27;15(7):e2002266
pubmed: 28749982
Sci Rep. 2017 Oct 2;7(1):12508
pubmed: 28970504
PLoS One. 2011;6(7):e22728
pubmed: 21829493
Genome Biol. 2006;7 Suppl 1:S2.1-31
pubmed: 16925836
Proc Natl Acad Sci U S A. 2018 Apr 24;115(17):4325-4333
pubmed: 29686065
Nat Methods. 2013 Dec;10(12):1177-84
pubmed: 24185837
Proc Natl Acad Sci U S A. 1996 Aug 20;93(17):9061-6
pubmed: 8799154
Curr Protoc Bioinformatics. 2014 Dec 12;48:4.11.1-39
pubmed: 25501943
Genome Res. 2004 May;14(5):988-95
pubmed: 15123596
BMC Bioinformatics. 2018 May 30;19(1):189
pubmed: 29843602
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Nucleic Acids Res. 2005 Nov 28;33(20):6494-506
pubmed: 16314312
Nucleic Acids Res. 1999 Jan 15;27(2):573-80
pubmed: 9862982
Nat Plants. 2017 Mar 27;3:17038
pubmed: 28346448
Cell. 2017 Oct 5;171(2):287-304.e15
pubmed: 28985561
Methods Mol Biol. 2019;1962:97-120
pubmed: 31020556
Genome Res. 2008 Jan;18(1):188-96
pubmed: 18025269
Nat Commun. 2013;4:2325
pubmed: 23942320
Bioinformatics. 2007 May 1;23(9):1061-7
pubmed: 17332020
Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21
pubmed: 21062823
Methods Mol Biol. 2019;1962:227-245
pubmed: 31020564
Bioinformatics. 2008 Nov 1;24(21):2438-44
pubmed: 18728043
PLoS Comput Biol. 2011 Oct;7(10):e1002195
pubmed: 22039361
Nat Commun. 2018 Dec 17;9(1):5346
pubmed: 30559369
BMC Bioinformatics. 2006 Feb 09;7:62
pubmed: 16469098
BMC Bioinformatics. 2019 Nov 8;20(1):558
pubmed: 31703556
Bioinformatics. 2016 Nov 15;32(22):3388-3395
pubmed: 27466621
NAR Genom Bioinform. 2020 Jun;2(2):lqaa026
pubmed: 32440658
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Nucleic Acids Res. 2014 Sep;42(15):e119
pubmed: 24990371
Comput Appl Biosci. 1996 Jun;12(3):161-70
pubmed: 8872383
Nucleic Acids Res. 2011 Jan;39(Database issue):D38-51
pubmed: 21097890
Nat Genet. 2013 Oct;45(10):1168-75
pubmed: 24013640
Bioinformatics. 2016 Mar 1;32(5):767-9
pubmed: 26559507
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12
pubmed: 15215400
Plant Physiol. 2006 Nov;142(3):1039-52
pubmed: 17012407
Nat Genet. 2017 Apr;49(4):643-650
pubmed: 28263316
BMC Bioinformatics. 2008 Jun 13;9:278
pubmed: 18554390
Mol Biol Evol. 2018 Mar 1;35(3):543-548
pubmed: 29220515
Nat Genet. 2011 Feb;43(2):109-16
pubmed: 21186353
Cell. 2011 Nov 23;147(5):1171-85
pubmed: 22118469
BMC Bioinformatics. 2004 May 14;5:59
pubmed: 15144565
BMC Bioinformatics. 2008 Dec 19;9:549
pubmed: 19099578