A machine-compiled database of genome-wide association studies.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
26 07 2019
26 07 2019
Historique:
received:
04
10
2017
accepted:
29
05
2019
entrez:
28
7
2019
pubmed:
28
7
2019
medline:
31
12
2019
Statut:
epublish
Résumé
Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estimated precision of 78-94% (measured relative to existing manually curated knowledge bases). This system represents a fully automated GWAS curation effort and is made possible by a paradigm for constructing machine learning systems called data programming. Our work represents a step towards making the curation of scientific literature more efficient using automated systems.
Identifiants
pubmed: 31350405
doi: 10.1038/s41467-019-11026-x
pii: 10.1038/s41467-019-11026-x
pmc: PMC6659642
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
3341Références
PLoS Comput Biol. 2015 Apr 17;11(4):e1004219
pubmed: 25885710
BMC Bioinformatics. 2011 Apr 15;12:99
pubmed: 21496265
Nucleic Acids Res. 2012 Jan;40(Database issue):D1308-12
pubmed: 22140107
Pac Symp Biocomput. 2012;:410-21
pubmed: 22174296
BMC Bioinformatics. 2016 Jan 11;17 Suppl 1:1
pubmed: 26817711
Bioinformatics. 2017 Jan 15;33(2):272-279
pubmed: 27663502
Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6
pubmed: 24316577
Eur J Hum Genet. 2014 Jul;22(7):949-52
pubmed: 24301061
Methods. 2015 Mar;74:83-9
pubmed: 25484339
J Biomed Inform. 2012 Oct;45(5):851-61
pubmed: 22580177
PLoS Comput Biol. 2012;8(12):e1002822
pubmed: 23300413
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Adv Neural Inf Process Syst. 2016 Dec;29:3567-3575
pubmed: 29872252
Nat Biotechnol. 2010 May;28(5):495-501
pubmed: 20436461