Mixture density networks for the indirect estimation of reference intervals.

Distributional regression Latent class regression Mixture density networks Reference intervals

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
29 Jul 2022
Historique:
received: 14 01 2022
accepted: 15 07 2022
entrez: 29 7 2022
pubmed: 30 7 2022
medline: 3 8 2022
Statut: epublish

Résumé

Reference intervals represent the expected range of physiological test results in a healthy population and are essential to support medical decision making. Particularly in the context of pediatric reference intervals, where recruitment regulations make prospective studies challenging to conduct, indirect estimation strategies are becoming increasingly important. Established indirect methods enable robust identification of the distribution of "healthy" samples from laboratory databases, which include unlabeled pathologic cases, but are currently severely limited when adjusting for essential patient characteristics such as age. Here, we propose the use of mixture density networks (MDN) to overcome this problem and model all parameters of the mixture distribution in a single step. Estimated reference intervals from varying settings with simulated data demonstrate the ability to accurately estimate latent distributions from unlabeled data using different implementations of MDNs. Comparing the performance with alternative estimation approaches further highlights the importance of modeling the mixture component weights as a function of the input in order to avoid biased estimates for all other parameters and the resulting reference intervals. We also provide a strategy to generate partially customized starting weights to improve proper identification of the latent components. Finally, the application on real-world hemoglobin samples provides results in line with current gold standard approaches, but also suggests further investigations with respect to adequate regularization strategies in order to prevent overfitting the data. Mixture density networks provide a promising approach capable of extracting the distribution of healthy samples from unlabeled laboratory databases while simultaneously and explicitly estimating all parameters and component weights as non-linear functions of the covariate(s), thereby allowing the estimation of age-dependent reference intervals in a single step. Further studies on model regularization and asymmetric component distributions are warranted to consolidate our findings and expand the scope of applications.

Sections du résumé

BACKGROUND BACKGROUND
Reference intervals represent the expected range of physiological test results in a healthy population and are essential to support medical decision making. Particularly in the context of pediatric reference intervals, where recruitment regulations make prospective studies challenging to conduct, indirect estimation strategies are becoming increasingly important. Established indirect methods enable robust identification of the distribution of "healthy" samples from laboratory databases, which include unlabeled pathologic cases, but are currently severely limited when adjusting for essential patient characteristics such as age. Here, we propose the use of mixture density networks (MDN) to overcome this problem and model all parameters of the mixture distribution in a single step.
RESULTS RESULTS
Estimated reference intervals from varying settings with simulated data demonstrate the ability to accurately estimate latent distributions from unlabeled data using different implementations of MDNs. Comparing the performance with alternative estimation approaches further highlights the importance of modeling the mixture component weights as a function of the input in order to avoid biased estimates for all other parameters and the resulting reference intervals. We also provide a strategy to generate partially customized starting weights to improve proper identification of the latent components. Finally, the application on real-world hemoglobin samples provides results in line with current gold standard approaches, but also suggests further investigations with respect to adequate regularization strategies in order to prevent overfitting the data.
CONCLUSIONS CONCLUSIONS
Mixture density networks provide a promising approach capable of extracting the distribution of healthy samples from unlabeled laboratory databases while simultaneously and explicitly estimating all parameters and component weights as non-linear functions of the covariate(s), thereby allowing the estimation of age-dependent reference intervals in a single step. Further studies on model regularization and asymmetric component distributions are warranted to consolidate our findings and expand the scope of applications.

Identifiants

pubmed: 35906555
doi: 10.1186/s12859-022-04846-0
pii: 10.1186/s12859-022-04846-0
pmc: PMC9336034
doi:

Substances chimiques

Hemoglobins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

307

Informations de copyright

© 2022. The Author(s).

Références

Clin Chem Lab Med. 2016 Dec 1;54(12):1893-1900
pubmed: 27748267
Clin Chem. 2012 May;58(5):808-10
pubmed: 22377530
Clin Chem Lab Med. 2007;45(8):1033-42
pubmed: 17867993
Sci Rep. 2021 Aug 6;11(1):16023
pubmed: 34362961
Clin Chem Lab Med. 2018 Dec 19;57(1):20-29
pubmed: 29672266
Clin Chem Lab Med. 2011 Apr;49(4):659-64
pubmed: 21342020
BMC Bioinformatics. 2020 Nov 13;21(1):524
pubmed: 33187469
PLoS One. 2016 Mar 04;11(3):e0149856
pubmed: 26942417
Clin Chem Lab Med. 2017 Jan 1;55(1):102-110
pubmed: 27505090
Clin Chim Acta. 2003 Aug;334(1-2):5-23
pubmed: 12867273
Bull Math Biol. 1990;52(1-2):99-115; discussion 73-97
pubmed: 2185863
Ann Clin Biochem. 2004 Jul;41(Pt 4):321-9
pubmed: 15298745
Sci Rep. 2020 Feb 3;10(1):1704
pubmed: 32015476
Psychol Rev. 1958 Nov;65(6):386-408
pubmed: 13602029
Clin Chem. 2015 Jul;61(7):964-73
pubmed: 25967371

Auteurs

Tobias Hepp (T)

Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstraße 6, 91054, Erlangen, Germany. tbs.hepp@fau.de.
Chair of Spatial Data Science and Statistical Learning, Georg-August-Universität Göttingen, Platz der Göttinger Sieben 3, 37073, Göttingen, Germany. tbs.hepp@fau.de.

Jakob Zierk (J)

Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Loschgestraße 15, 91054, Erlangen, Germany.

Manfred Rauh (M)

Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Loschgestraße 15, 91054, Erlangen, Germany.

Markus Metzler (M)

Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Loschgestraße 15, 91054, Erlangen, Germany.

Sarem Seitz (S)

Department of Information Systems and Applied Computer Science, Otto-Friedrich-Universität Bamberg, Kapuzinerstraße 16, 96047, Bamberg, Germany.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH