Detecting Gene Ontology misannotations using taxon-specific rate ratio comparisons.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
15 08 2020
15 08 2020
Historique:
received:
03
09
2019
revised:
24
03
2020
accepted:
26
05
2020
pubmed:
30
5
2020
medline:
20
2
2021
entrez:
30
5
2020
Statut:
ppublish
Résumé
Many protein function databases are built on automated or semi-automated curations and can contain various annotation errors. The correction of such misannotations is critical to improving the accuracy and reliability of the databases. We proposed a new approach to detect potentially incorrect Gene Ontology (GO) annotations by comparing the ratio of annotation rates (RAR) for the same GO term across different taxonomic groups, where those with a relatively low RAR usually correspond to incorrect annotations. As an illustration, we applied the approach to 20 commonly studied species in two recent UniProt-GOA releases and identified 250 potential misannotations in the 2018-11-6 release, where only 25% of them were corrected in the 2019-6-3 release. Importantly, 56% of the misannotations are 'Inferred from Biological aspect of Ancestor (IBA)' which is in contradiction with previous observations that attributed misannotations mainly to 'Inferred from Sequence or structural Similarity (ISS)', probably reflecting an error source shift due to the new developments of function annotation databases. The results demonstrated a simple but efficient misannotation detection approach that is useful for large-scale comparative protein function studies. https://zhanglab.ccmb.med.umich.edu/RAR. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 32470107
pii: 5848643
doi: 10.1093/bioinformatics/btaa548
pmc: PMC7751014
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
4383-4388Subventions
Organisme : NIAID NIH HHS
ID : R01 AI134678
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM083107
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM136422
Pays : United States
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
BMC Bioinformatics. 2007 May 22;8:170
pubmed: 17519041
Proc Natl Acad Sci U S A. 2009 Mar 24;106(12):4864-9
pubmed: 19273841
Gigascience. 2014 Mar 18;3(1):4
pubmed: 24641996
BMC Bioinformatics. 2007 Aug 03;8:284
pubmed: 17683567
Bioinformatics. 2014 May 1;30(9):1236-40
pubmed: 24451626
PLoS Comput Biol. 2012 May;8(5):e1002533
pubmed: 22693439
PLoS Comput Biol. 2009 Dec;5(12):e1000605
pubmed: 20011109
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
J Proteome Res. 2018 Dec 7;17(12):4186-4196
pubmed: 30265558
J Biol Chem. 1995 Aug 25;270(34):20201-6
pubmed: 7650039
Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63
pubmed: 25378336
Nucleic Acids Res. 2019 Jan 8;47(D1):D419-D426
pubmed: 30407594
PLoS Comput Biol. 2013;9(5):e1003063
pubmed: 23737737
BMC Bioinformatics. 2010 Oct 25;11:530
pubmed: 20973947
Antimicrob Agents Chemother. 2011 Jan;55(1):291-301
pubmed: 20956591
Database (Oxford). 2014 Jun 12;2014:
pubmed: 24923819
Biochemistry. 1991 Feb 12;30(6):1673-82
pubmed: 1993184
Brief Bioinform. 2011 Sep;12(5):449-62
pubmed: 21873635