Semantic Integration and Enrichment of Heterogeneous Biological Databases.

Data integration Keyword search Knowledge representation Ontology-based data access Query processing RDF stores Relational databases

Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2019
Historique:
entrez: 7 7 2019
pubmed: 7 7 2019
medline: 9 1 2020
Statut: ppublish

Résumé

Biological databases are growing at an exponential rate, currently being among the major producers of Big Data, almost on par with commercial generators, such as YouTube or Twitter. While traditionally biological databases evolved as independent silos, each purposely built by a different research group in order to answer specific research questions; more recently significant efforts have been made toward integrating these heterogeneous sources into unified data access systems or interoperable systems using the FAIR principles of data sharing. Semantic Web technologies have been key enablers in this process, opening the path for new insights into the unified data, which were not visible at the level of each independent database. In this chapter, we first provide an introduction into two of the most used database models for biological data: relational databases and RDF stores. Next, we discuss ontology-based data integration, which serves to unify and enrich heterogeneous data sources. We present an extensive timeline of milestones in data integration based on Semantic Web technologies in the field of life sciences. Finally, we discuss some of the remaining challenges in making ontology-based data access (OBDA) systems easily accessible to a larger audience. In particular, we introduce natural language search interfaces, which alleviate the need for database users to be familiar with technical query languages. We illustrate the main theoretical concepts of data integration through concrete examples, using two well-known biological databases: a gene expression database, Bgee, and an orthology database, OMA.

Identifiants

pubmed: 31278681
doi: 10.1007/978-1-4939-9074-0_22
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

655-690

Auteurs

Ana Claudia Sima (AC)

ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland. simn@zhaw.ch.
University of Lausanne, Lausanne, Switzerland. simn@zhaw.ch.

Kurt Stockinger (K)

ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland.

Tarcisio Mendes de Farias (TM)

University of Lausanne, Lausanne, Switzerland.
SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.

Manuel Gil (M)

ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland.
SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH