A machine-compiled database of genome-wide association studies.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
26 07 2019
Historique:
received: 04 10 2017
accepted: 29 05 2019
entrez: 28 7 2019
pubmed: 28 7 2019
medline: 31 12 2019
Statut: epublish

Résumé

Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estimated precision of 78-94% (measured relative to existing manually curated knowledge bases). This system represents a fully automated GWAS curation effort and is made possible by a paradigm for constructing machine learning systems called data programming. Our work represents a step towards making the curation of scientific literature more efficient using automated systems.

Identifiants

pubmed: 31350405
doi: 10.1038/s41467-019-11026-x
pii: 10.1038/s41467-019-11026-x
pmc: PMC6659642
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

3341

Références

PLoS Comput Biol. 2015 Apr 17;11(4):e1004219
pubmed: 25885710
BMC Bioinformatics. 2011 Apr 15;12:99
pubmed: 21496265
Nucleic Acids Res. 2012 Jan;40(Database issue):D1308-12
pubmed: 22140107
Pac Symp Biocomput. 2012;:410-21
pubmed: 22174296
BMC Bioinformatics. 2016 Jan 11;17 Suppl 1:1
pubmed: 26817711
Bioinformatics. 2017 Jan 15;33(2):272-279
pubmed: 27663502
Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6
pubmed: 24316577
Eur J Hum Genet. 2014 Jul;22(7):949-52
pubmed: 24301061
Methods. 2015 Mar;74:83-9
pubmed: 25484339
J Biomed Inform. 2012 Oct;45(5):851-61
pubmed: 22580177
PLoS Comput Biol. 2012;8(12):e1002822
pubmed: 23300413
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Adv Neural Inf Process Syst. 2016 Dec;29:3567-3575
pubmed: 29872252
Nat Biotechnol. 2010 May;28(5):495-501
pubmed: 20436461

Auteurs

Volodymyr Kuleshov (V)

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA. kuleshov@cs.stanford.edu.
Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA. kuleshov@cs.stanford.edu.

Jialin Ding (J)

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA.

Christopher Vo (C)

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA.

Braden Hancock (B)

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA.

Alexander Ratner (A)

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA.

Yang Li (Y)

Department of Medicine, University of Chicago, Chicago, IL, 60637, USA.

Christopher Ré (C)

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA.

Serafim Batzoglou (S)

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA.

Michael Snyder (M)

Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH