Exploring automatic inconsistency detection for literature-based gene ontology annotation.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
24 06 2022
24 06 2022
Historique:
accepted:
08
04
2022
entrez:
27
6
2022
pubmed:
28
6
2022
medline:
30
6
2022
Statut:
ppublish
Résumé
Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.
Identifiants
pubmed: 35758780
pii: 6617491
doi: 10.1093/bioinformatics/btac230
pmc: PMC9235499
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
i273-i281Informations de copyright
© The Author(s) 2022. Published by Oxford University Press.
Références
Nucleic Acids Res. 2019 Jan 8;47(D1):D801-D806
pubmed: 30407599
Methods Mol Biol. 2017;1446:175-188
pubmed: 27812943
BMC Bioinformatics. 2018 Mar 9;19(1):94
pubmed: 29523070
BMC Bioinformatics. 2005;6 Suppl 1:S17
pubmed: 15960829
BMC Bioinformatics. 2012 Jul 09;13:161
pubmed: 22776079
Nucleic Acids Res. 2015 Jan;43(Database issue):D36-42
pubmed: 25355515
Database (Oxford). 2017 Jan 10;2017:
pubmed: 28077566
Methods Mol Biol. 2017;1446:15-24
pubmed: 27812932
J Alzheimers Dis. 2020;75(4):1417-1435
pubmed: 32417785
Database (Oxford). 2017 Jan 08;:
pubmed: 28334741
PLoS Comput Biol. 2012 May;8(5):e1002533
pubmed: 22693439
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
J Biomed Inform. 2013 Oct;46(5):914-20
pubmed: 23906817
PLoS One. 2012;7(7):e40519
pubmed: 22848383
Nat Genet. 2004 May;36(5):431-2
pubmed: 15118671
Database (Oxford). 2014 Jul 28;2014:
pubmed: 25070993
Nat Genet. 2019 Oct;51(10):1429-1433
pubmed: 31548717
Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334
pubmed: 33290552
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22
pubmed: 23703206
Bioinformatics. 2009 Nov 15;25(22):3045-6
pubmed: 19744993
Database (Oxford). 2013 Jul 09;2013:bat054
pubmed: 23842463
Nucleic Acids Res. 2009 Jan;37(1):1-13
pubmed: 19033363
Bioinformatics. 2018 Jul 1;34(13):i457-i466
pubmed: 29949996
Bioinformatics. 2002 Dec;18(12):1641-9
pubmed: 12490449
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489
pubmed: 33237286
Methods Mol Biol. 2017;1446:41-54
pubmed: 27812934
BMC Bioinformatics. 2014 Feb 26;15:59
pubmed: 24571547
Bioinformatics. 2017 Jul 15;33(14):i49-i58
pubmed: 28881973
BMC Bioinformatics. 2021 Nov 25;22(1):565
pubmed: 34823464