META-BASE: A Novel Architecture for Large-Scale Genomic Metadata Integration.


Journal

IEEE/ACM transactions on computational biology and bioinformatics
ISSN: 1557-9964
Titre abrégé: IEEE/ACM Trans Comput Biol Bioinform
Pays: United States
ID NLM: 101196755

Informations de publication

Date de publication:
Historique:
pubmed: 6 8 2020
medline: 19 2 2022
entrez: 6 8 2020
Statut: ppublish

Résumé

The integration of genomic metadata is, at the same time, an important, difficult, and well-recognized challenge. It is important because a wealth of public data repositories is available to drive biological and clinical research; combining information from various heterogeneous and widely dispersed sources is paramount to a number of biological discoveries. It is difficult because the domain is complex and there is no agreement among the various metadata definitions, which refer to different vocabularies and ontologies. It is well-recognized in the bioinformatics community because, in the common practice, repositories are accessed one-by-one, learning their specific metadata definitions as result of long and tedious efforts, and such practice is error-prone. In this paper, we describe META-BASE, an architecture for integrating metadata extracted from a variety of genomic data sources, based upon a structured transformation process. We present a variety of innovative techniques for data extraction, cleaning, normalization and enrichment. We propose a general, open and extensible pipeline that can easily incorporate any number of new data sources, and propose the resulting repository-already integrating several important sources-which is exposed by means of practical user interfaces to respond biological researchers' needs.

Identifiants

pubmed: 32750853
doi: 10.1109/TCBB.2020.2998954
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

543-557

Auteurs

Articles similaires

Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic
Genome, Bacterial Virulence Phylogeny Genomics Plant Diseases

Classifications MeSH