Dug: a semantic search engine leveraging peer-reviewed knowledge to query biomedical data repositories.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
13 06 2022
13 06 2022
Historique:
received:
07
07
2021
revised:
04
03
2022
accepted:
15
04
2022
pubmed:
21
4
2022
medline:
15
11
2022
entrez:
20
4
2022
Statut:
ppublish
Résumé
As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. Dug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 35441678
pii: 6571145
doi: 10.1093/bioinformatics/btac284
pmc: PMC9991886
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
3252-3258Subventions
Organisme : National Heart, Lung, and Blood Institute (NHLBI)
ID : 1-OT3-HL142479-01
Organisme : National Center for Advancing Translational Sciences (NCATS)
ID : 1-OT3-TR002020-01
Organisme : Helping to End Addiction Long-Term (HEAL) Office
ID : 1-OT2-OD031940-01
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
Sci Data. 2017 Jun 06;4:170059
pubmed: 28585923
Nat Methods. 2020 Mar;17(3):261-272
pubmed: 32015543
Bioinformatics. 2019 May 15;35(10):1799-1801
pubmed: 30329013
J Am Med Inform Assoc. 2015 Jan;22(1):65-75
pubmed: 25361575
Nature. 2021 Feb;590(7845):198-201
pubmed: 33568833
JAMA. 2018 Jul 10;320(2):129-130
pubmed: 29896636
Patterns (N Y). 2021 Jan 8;2(1):100155
pubmed: 33196056
Nucleic Acids Res. 2017 Jan 4;45(D1):D712-D722
pubmed: 27899636
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
pubmed: 14681409
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
J Chem Inf Model. 2019 Dec 23;59(12):4968-4973
pubmed: 31769676
Clin Transl Sci. 2019 Mar;12(2):91-94
pubmed: 30412340
Database (Oxford). 2019 Jan 1;2019:
pubmed: 31820804
Database (Oxford). 2012 Mar 20;2012:bas016
pubmed: 22434847
Am J Epidemiol. 2021 Oct 1;190(10):1977-1992
pubmed: 33861317
BMC Med Inform Decis Mak. 2006 Apr 04;6:19
pubmed: 16595012
Clin J Am Soc Nephrol. 2015 Apr 7;10(4):710-5
pubmed: 25376765
J Am Med Inform Assoc. 2018 Mar 1;25(3):300-308
pubmed: 29346583
Nat Genet. 2017 May 26;49(6):816-819
pubmed: 28546571
J Biomed Semantics. 2016 May 10;7:25
pubmed: 27175225
N Engl J Med. 2019 Aug 15;381(7):668-676
pubmed: 31412182
JMIR Med Inform. 2020 Nov 23;8(11):e17964
pubmed: 33226347