Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings.


Journal

Proceedings of the conference. Association for Computational Linguistics. Meeting
ISSN: 0736-587X
Titre abrégé: Proc Conf Assoc Comput Linguist Meet
Pays: United States
ID NLM: 101639983

Informations de publication

Date de publication:
Jul 2020
Historique:
entrez: 22 3 2021
pubmed: 23 3 2021
medline: 23 3 2021
Statut: ppublish

Résumé

Much of biomedical and healthcare data is encoded in discrete, symbolic form such as text and medical codes. There is a wealth of expert-curated biomedical domain knowledge stored in knowledge bases and ontologies, but the lack of reliable methods for learning knowledge representation has limited their usefulness in machine learning applications. While text-based representation learning has significantly improved in recent years through advances in natural language processing, attempts to learn biomedical concept embeddings so far have been lacking. A recent family of models called knowledge graph embeddings have shown promising results on general domain knowledge graphs, and we explore their capabilities in the biomedical domain. We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph, provide a benchmark with comparison to existing methods and in-depth discussion on best practices, and make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation. The embeddings, code, and materials will be made available to the community.

Identifiants

pubmed: 33746351
doi: 10.18653/v1/2020.bionlp-1.18
pmc: PMC7971091
mid: NIHMS1676481
doi:

Types de publication

Journal Article

Langues

eng

Pagination

167-176

Subventions

Organisme : NLM NIH HHS
ID : T15 LM007056
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001863
Pays : United States

Références

J Am Med Inform Assoc. 2011 Jul-Aug;18(4):441-8
pubmed: 21515544
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
pubmed: 14681409
KDD. 2016 Aug;2016:855-864
pubmed: 27853626
Pac Symp Biocomput. 2020;25:295-306
pubmed: 31797605
Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338
pubmed: 30395331

Auteurs

David Chang (D)

Yale Center for Medical Informatics, Yale University.

Ivana Balažević (I)

School of Informatics, University of Edinburgh, UK.

Carl Allen (C)

School of Informatics, University of Edinburgh, UK.

Daniel Chawla (D)

Yale Center for Medical Informatics, Yale University.

Cynthia Brandt (C)

Yale Center for Medical Informatics, Yale University.

Richard Andrew Taylor (RA)

Yale Center for Medical Informatics, Yale University.

Classifications MeSH