gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks.
Alignment
Bioinformatics
Gene prediction
Genome annotation
Protein annotation
Journal
Genomics, proteomics & bioinformatics
ISSN: 2210-3244
Titre abrégé: Genomics Proteomics Bioinformatics
Pays: China
ID NLM: 101197608
Informations de publication
Date de publication:
06 2019
06 2019
Historique:
received:
24
10
2018
revised:
21
03
2019
accepted:
29
04
2019
pubmed:
23
8
2019
medline:
21
12
2019
entrez:
23
8
2019
Statut:
ppublish
Résumé
Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames, start sites, splice sites, and related structural features. The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures. In addition, the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations, nor do they represent these annotations in a format consistent with current file standards. These frameworks also lack consideration for functional attributes, such as the presence or absence of protein domains that can be used for gene model validation. To provide oversight to the increasing number of published genome annotations, we present a software package, the Gene Filtering, Analysis, and Conversion (gFACs), to filter, analyze, and convert predicted gene models and alignments. The software operates across a wide range of alignment, analysis, and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes. gFACs supports common downstream applications, including genome browsers, and generates extensive details on the filtering process, including distributions that can be visualized to further assess the proposed gene space. gFACs is freely available and implemented in Perl with support from BioPerl libraries at https://gitlab.com/PlantGenomicsLab/gFACs.
Identifiants
pubmed: 31437583
pii: S1672-0229(19)30120-2
doi: 10.1016/j.gpb.2019.04.002
pmc: PMC6818179
pii:
doi:
Types de publication
Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
305-310Informations de copyright
Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.
Références
Nature. 1997 Dec 11;390(6660):580-6
pubmed: 9403685
Genome Biol. 2005;6(5):R44
pubmed: 15892872
Genome Res. 2008 Jan;18(1):188-96
pubmed: 18025269
Genome Biol. 2009;10(4):R42
pubmed: 19393038
Microbiology (Reading). 2010 Jul;156(Pt 7):1909-1917
pubmed: 20430813
Nat Genet. 2010 Oct;42(10):833-9
pubmed: 20802477
Nucleic Acids Res. 2017 Jan 4;45(D1):D37-D42
pubmed: 27899564
BMC Bioinformatics. 2005 Feb 15;6:31
pubmed: 15713233
Genome Biol. 2016 Apr 12;17:66
pubmed: 27072794
Fly (Austin). 2012 Apr-Jun;6(2):80-92
pubmed: 22728672
Bioinformatics. 2005 May 1;21(9):1859-75
pubmed: 15728110
Genes Dev. 2017 Sep 1;31(17):1717-1731
pubmed: 28982758
Genome Biol. 2008 Jan 11;9(1):R7
pubmed: 18190707
Sci Rep. 2017 Sep 29;7(1):12422
pubmed: 28963504
Proc Natl Acad Sci U S A. 2018 Apr 24;115(17):4325-4333
pubmed: 29686065
Bioinformatics. 2016 Mar 1;32(5):767-9
pubmed: 26559507
PLoS Comput Biol. 2006 May;2(5):e57
pubmed: 16710451
PLoS Comput Biol. 2014 Dec 04;10(12):e1003998
pubmed: 25474019
Bioinformatics. 2014 Jul 15;30(14):2068-9
pubmed: 24642063