Probabilistic Approaches to Overcome Content Heterogeneity in Data Integration: A Study Case in Systematic Lupus Erythematosus.
Probabilistic data integration
biomedical data harmonisation
content heterogeneity
missing data
Journal
Studies in health technology and informatics
ISSN: 1879-8365
Titre abrégé: Stud Health Technol Inform
Pays: Netherlands
ID NLM: 9214582
Informations de publication
Date de publication:
16 Jun 2020
16 Jun 2020
Historique:
entrez:
24
6
2020
pubmed:
24
6
2020
medline:
26
8
2020
Statut:
ppublish
Résumé
Integrating data from different sources into homogeneous dataset increases the opportunities to study human health. However, disparate data collections are often heterogeneous, which complicates their integration. In this paper, we focus on the issue of content heterogeneity in data integration. Traditional approaches for resolving content heterogeneity map all source datasets to a common data model that includes only shared data items, and thus omit all items that vary between datasets. Based on an example of three datasets in Systemic Lupus Erythematosus, we describe and experimentally evaluate a probabilistic data integration approach which propagates the uncertainty resulting from content heterogeneity into statistical inference, avoiding the need to map to a common data model.
Identifiants
pubmed: 32570412
pii: SHTI200188
doi: 10.3233/SHTI200188
doi:
Types de publication
Journal Article
Langues
eng