TSEBRA: transcript selector for BRAKER.

Genome Genomics RNA-Seq Sequence Analysis, RNA Software

Evidence integration Gene prediction Genome annotation Protein homology Protein-coding genes RNA-seq

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
25 Nov 2021

Historique:

received: 04 06 2021

accepted: 15 11 2021

entrez: 26 11 2021

pubmed: 27 11 2021

medline: 30 11 2021

Statut: epublish

Résumé

BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistical models. For training and prediction, BRAKER1 and BRAKER2 incorporate complementary extrinsic evidence: BRAKER1 uses only RNA-seq data while BRAKER2 uses only a database of cross-species proteins. The BRAKER suite has so far not been able to reliably exceed the accuracy of BRAKER1 and BRAKER2 when incorporating both types of evidence simultaneously. Currently, for a novel genome project where both RNA-seq and protein data are available, the best option is to run both pipelines independently, and to pick one, likely better output. Therefore, one or another type of the extrinsic evidence would remain unexploited. We present TSEBRA, a software that selects gene predictions (transcripts) from the sets generated by BRAKER1 and BRAKER2. TSEBRA uses a set of rules to compare scores of overlapping transcripts based on their support by RNA-seq and homologous protein evidence. We show in computational experiments on genomes of 11 species that TSEBRA achieves higher accuracy than either BRAKER1 or BRAKER2 running alone and that TSEBRA compares favorably with the combiner tool EVidenceModeler. TSEBRA is an easy-to-use and fast software tool. It can be used in concert with the BRAKER pipeline to generate a gene prediction set supported by both RNA-seq and homologous protein evidence.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

We present TSEBRA, a software that selects gene predictions (transcripts) from the sets generated by BRAKER1 and BRAKER2. TSEBRA uses a set of rules to compare scores of overlapping transcripts based on their support by RNA-seq and homologous protein evidence. We show in computational experiments on genomes of 11 species that TSEBRA achieves higher accuracy than either BRAKER1 or BRAKER2 running alone and that TSEBRA compares favorably with the combiner tool EVidenceModeler.

CONCLUSION CONCLUSIONS

TSEBRA is an easy-to-use and fast software tool. It can be used in concert with the BRAKER pipeline to generate a gene prediction set supported by both RNA-seq and homologous protein evidence.

Identifiants

DOI: 10.1186/s12859-021-04482-0 PMID: 34823473 PMC: PMC8620231

pubmed: 34823473

doi: 10.1186/s12859-021-04482-0

pii: 10.1186/s12859-021-04482-0

pmc: PMC8620231

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

566

Subventions

Organisme : NIGMS NIH HHS

ID : R01 GM128145

Pays : United States

Organisme : NIH HHS

ID : GM128145

Pays : United States

Informations de copyright

Références

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W123-8

pubmed: 23700307

Mol Ecol Resour. 2021 Jan;21(1):327-339

pubmed: 32985129

Nature. 2009 Jan 29;457(7229):551-6

pubmed: 19189423

BMC Bioinformatics. 2011 Dec 22;12:491

pubmed: 22192575

Proc Natl Acad Sci U S A. 2018 May 15;115(20):E4700-E4709

pubmed: 29717040

Nucleic Acids Res. 2005 Nov 28;33(20):6494-506

pubmed: 16314312

Genome Res. 2004 Jan;14(1):142-8

pubmed: 14707176

Nucleic Acids Res. 2003 Oct 1;31(19):5654-66

pubmed: 14500829

Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21

pubmed: 21062823

NAR Genom Bioinform. 2021 Jan 06;3(1):lqaa108

pubmed: 33575650

Bioinformatics. 2008 Mar 1;24(5):597-605

pubmed: 18187439

Genome Biol. 2008 Jan 11;9(1):R7

pubmed: 18190707

BMC Plant Biol. 2018 Apr 12;18(1):62

pubmed: 29649979

BMC Bioinformatics. 2021 Apr 20;22(1):205

pubmed: 33879057

BMC Bioinformatics. 2014 Jun 14;15:189

pubmed: 24927652

Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811

pubmed: 30395283

Nat Commun. 2019 Oct 16;10(1):4702

pubmed: 31619678

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W435-9

pubmed: 16845043

Methods Mol Biol. 2019;1962:65-95

pubmed: 31020555

Bioinformatics. 2008 Mar 1;24(5):637-44

pubmed: 18218656

Bioinformatics. 2005 Sep 15;21(18):3596-603

pubmed: 16076884

Nat Biotechnol. 2011 May 15;29(7):644-52

pubmed: 21572440

BMC Bioinformatics. 2018 May 30;19(1):189

pubmed: 29843602

Nat Biotechnol. 2019 Aug;37(8):907-915

pubmed: 31375807

Nat Rev Genet. 2009 Jan;10(1):57-63

pubmed: 19015660

BMC Genomics. 2015 Feb 26;16:134

pubmed: 25766582

BMC Bioinformatics. 2006 Feb 09;7:62

pubmed: 16469098

BMC Bioinformatics. 2019 Nov 8;20(1):558

pubmed: 31703556

Genome Res. 2008 Dec;18(12):1979-90

pubmed: 18757608

Psychometrika. 1947 Jun;12(2):153-7

pubmed: 20254758

NAR Genom Bioinform. 2020 Jun;2(2):lqaa026

pubmed: 32440658

Nat Methods. 2015 Jan;12(1):59-60

pubmed: 25402007

Nucleic Acids Res. 2014 Sep;42(15):e119

pubmed: 24990371

Bioinformatics. 2005 Jun;21 Suppl 1:i57-65

pubmed: 15961499

Nat Biotechnol. 2015 May;33(5):531-7

pubmed: 25893781

Bioinformatics. 2016 Mar 1;32(5):767-9

pubmed: 26559507

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12

pubmed: 15215400

Nucleic Acids Res. 2022 Apr 22;50(7):e37

pubmed: 34928390

Nucleic Acids Res. 2021 Jan 8;49(D1):D92-D96

pubmed: 33196830

TSEBRA: transcript selector for BRAKER.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Lars Gabriel (L)

Katharina J Hoff (KJ)

Tomáš Brůna (T)

Mark Borodovsky (M)

Mario Stanke (M)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Comparative genomic analysis and characterization of novel high-quality draft genomes from the coal metagenome.

Accuracy of web-based automated versus digital manual cephalometric landmark identification.

Classifications MeSH