Boosting distributional copula regression.

Archimedean copula GAMLSS component-wise gradient boosting early stopping tail dependence

Journal

Biometrics
ISSN: 1541-0420
Titre abrégé: Biometrics
Pays: United States
ID NLM: 0370625

Informations de publication

Date de publication:
09 2023
Historique:
received: 21 03 2022
accepted: 15 09 2022
medline: 13 9 2023
pubmed: 28 9 2022
entrez: 27 9 2022
Statut: ppublish

Résumé

Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical research. Distributional copula regression provides a flexible tool to model the joint distribution of multiple outcome variables by disentangling the marginal response distributions and their dependence structure. In a regression setup, each parameter of the copula model, that is, the marginal distribution parameters and the copula dependence parameters, can be related to covariates via structured additive predictors. We propose a framework to fit distributional copula regression via model-based boosting, which is a modern estimation technique that incorporates useful features like an intrinsic variable selection mechanism, parameter shrinkage and the capability to fit regression models in high-dimensional data setting, that is, situations with more covariates than observations. Thus, model-based boosting does not only complement existing Bayesian and maximum-likelihood based estimation frameworks for this model class but rather enables unique intrinsic mechanisms that can be helpful in many applied problems. The performance of our boosting algorithm for copula regression models with continuous margins is evaluated in simulation studies that cover low- and high-dimensional data settings and situations with and without dependence between the responses. Moreover, distributional copula boosting is used to jointly analyze and predict the length and the weight of newborns conditional on sonographic measurements of the fetus before delivery together with other clinical variables.

Identifiants

pubmed: 36165288
doi: 10.1111/biom.13765
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2298-2310

Informations de copyright

© 2021 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society.

Références

Barker, D.J. (1997) The long-term outcome of retarded fetal growth. Clinical Obstetrics and Gynecology, 40(4), 853-863.
Bermingham, M.L., Pong-Wong, R., Spiliopoulou, A. et al. (2015) Application of high-dimensional feature selection: evaluation for genomic prediction in man. Scientific Reports, 5(10312), 1-12.
Boulet, S.L., Alexander, G.R., Salihu, H.M. & Pass, M. (2003) Macrosomic births in the united states: determinants, outcomes, and proposed grades of risk. American Journal of Obstetrics and Gynecology, 188(5), 1372-1378.
Bühlmann, P. & Hothorn, T. (2007) Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477-505.
Bühlmann, P. & Yu, B. (2003) Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98(462), 324-339.
Craiu, V.R. & Sabeti, A. (2012) In mixed company: Bayesian inference for bivariate conditional copula models with discrete and continuous outcomes. Journal of Multivariate Analysis, 110, 106-120.
Dudley, N. (2005) A systematic review of the ultrasound estimation of fetal weight. Ultrasound in Obstetrics and Gynecology, 25(1), 80-89.
Eilers, P.H. & Marx, B.D. (1996) Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89-121.
Espasandín-Domínguez, J., Cadarso-Suárez, C., Kneib, T., Marra, G., Klein, N., Radice, R. et al. (2019) Assessing the relationship between markers of glycemic control through flexible copula regression models. Statistics in Medicine, 38(27), 5161-5181.
Fahrmeir, L., Kneib, T., Lang, S. & Marx, B. (2013) Regression-models, methods and applications. Berlin: Springer.
Faschingbauer, F., Beckmann, M., Goecke, T., Yazdi, B., Siemer, J., Schmid, M. et al. (2012) A new formula for optimized weight estimation in extreme fetal macrosomia (≥ 4500 g). Ultraschall in der Medizin, 33(5), 480-488.
Faschingbauer, F., Dammer, U., Raabe, E., Schneider, M., Faschingbauer, C., Schmid, M. et al. (2015) Sonographic weight estimation in fetal macrosomia: influence of the time interval between estimation and delivery. Archives of Gynecology and Obstetrics, 292, 59-67.
Faschingbauer, F., Dammer, U., Raabe, E., Kehl, S., Schmid, M., Schild, R.L. et al. (2016) A new sonographic weight estimation formula for small-for-gestational-age fetuses. Journal of Ultrasound in Medicine, 35(8), 1713-1724.
Faschingbauer, F., Raabe, E., Heimrich, J., Faschingbauer, C., Schmid, M., Mayr, A. et al. (2016) Accuracy of sonographic fetal weight estimation: influence of the scan-to-delivery interval in combination with the applied weight estimation formula. Archives of Gynecology and Obstetrics, 294(3), 487-493.
Freund, Y. & Schapire, R.E. (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning theory. San Francisco: Morgan Kaufmann, pp. 148-156.
Friedman, J., Hastie, T. & Tibshirani, R. (2000) Additive logistic regression: a statistical view of boosting. The Annals of Statistics, 28(2), 337-374.
Friedman, J.H. (2001) Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 29(5), 1189-1232.
Ghosh, S. (2014) Multivariate analyses of blood pressure related phenotypes in a longitudinal framework: insights from GWAS 18. Genetic Epidemiology, 38(Suppl 1), S63-S67.
Gneiting, T. & Raftery, A.E. (2007) Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359-378.
Hastie, T. (2007) Comment: boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 513-515.
Hastie, T., Tibshirani, R. & Friedman, J. (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edition, New York: Springer.
Hastie, T.J. & Tibshirani, R.J. (1990) Generalized additive models. London: CRC.
Hofner, B., Kneib, T. & Hothorn, T. (2016) A unified framework of constrained regression. Statistics and Computing, 26(1), 1-14.
Hofner, B., Mayr, A., Robinzonov, N. & Schmid, M. (2014) Model-based boosting in R: a hands-on tutorial using the R package mboost. Computational Statistics, 29, 3-35.
Hofner, B., Mayr, A. & Schmid, M. (2016) gamboostLSS: an R package for model building and variable selection in the GAMLSS framework. Journal of Statistical Software, 74(1), 1-31.
Hothorn, T., Leisch, F., Zeileis, A. & Hornik, K. (2005) The design and analysis of benchmark experiments. Journal of Computational & Graphical Statistics, 14(3), 675-699.
Jordan, A., Krüger, F. & Lerch, S. (2019) Evaluating probabilistic forecasts with scoring rules. Journal of Statistical Software, 90(12), 1-37.
Klein, N., Hothorn, T., Barbanti, L. & Kneib, T. (2022) Multivariate conditional transformation models. Scandinavian Journal of Statistics, 49(1), 116-142.
Klein, N. & Kneib, T. (2016) Simultaneous inference in structured additive conditional copula regression models: a unifying Bayesian approach. Stat. and Computing, 26(4), 841-860.
Klein, N., Kneib, T., Marra, G., Radice, R., Rokicki, S. & McGovern, M.E. (2019) Mixed binary-continuous copula regression models with application to adverse birth outcomes. Statistics in Medicine, 38(3), 413-436.
Kolev, N. & Paiva, D. (2009) Copula-based regression models: a survey. Journal of Statistical Planning and Inference, 139(11), 3847-3856.
Marra, G. & Radice, R. (2017) Bivariate copula additive models for location, scale and shape. Computational Statistics & Data Analysis, 112, 99-113.
Mayr, A., Fenske, N., Hofner, B., Kneib, T. & Schmid, M. (2012) Generalized additive models for location, scale and shape for high- dimensional data-a flexible approach based on boosting. Journal of the Royal Statistical Society: Series C, 61(3), 403-427.
Mayr, A., Hofner, B. & Schmid, M. (2012) The importance of knowing when to stop. Methods of Information in Medicine, 51(02), 178-186.
McNeil, A.J., Frey, R. & Embrechts, P. (2005) Quantitative risk management: concepts, techniques and tools. New Jersey: Princeton University Press.
Nelsen, R.B. (2006) An introduction to copulas, 2nd edition, New York: Springer.
Ott, J. & Wang, J. (2011) Multiple phenotypes in genome-wide genetic mapping studies. Protein & Cell, 2(7), 519-522.
Patton, A.J. (2006) Modelling asymmetric exchange rate dependence. International Economic Review, 47(2), 527-556.
Petterle, R.R., Laureano, H.A., da Silva, G.P. & Bonat, W.H. (2021) Multivariate generalized linear mixed models for continuous bounded outcomes: analyzing the body fat percentage data. Statistical Methods in Medical Research, 30(12), 2619-2633.
R Core Team (2020) R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. R version 4.0.3.
Radice, R., Marra, G. & Wojtyś, M. (2016) Copula regression spline models for binary outcomes. Statistics and Computing, 26(5), 981-995.
Rigby, R.A. & Stasinopoulos, M.D. (2005) Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C, 54(3), 507-554.
Romero, R., Espinoza, J., Gotsch, F., Kusanovic, J.P., Friel, L., Erez, O. et al. (2006) The use of high-dimensional biology (genomics, transcriptomics, proteomics, and metabolomics) to understand the preterm parturition syndrome. British Journal of Obstetrics and Gynaecology, 113, 118-135.
Sabeti, A., Wei, M. & Craiu, R.V. (2014) Additive models for conditional copulas. Statistics, 3(1), 300-312.
Schild, R., Maringa, M., Siemer, J., Meurer, B., Hart, N., Goecke, T., et al. (2008) Weight estimation by three-dimensional ultrasound imaging in the small fetus. Ultrasound in Obstetrics and Gynecology, 32(2), 168-175.
Sklar, M. (1959) Fonctions de répartition à n dimensions et leurs marges. Publications de l'Institut de Statistique de l'Université de Paris, 8, 229-231.
Song, P.X. (2000) Multivariate dispersion models generated from Gaussian copula. Scandinavian Journal of Statistics, 27(2), 305-320.
Staerk, C. & Mayr, A. (2021) Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction. BMC Bioinformatics, 22(1), 1-28.
Strömer, A., Staerk, C., Klein, N., Weinhold, L., Titze, S. & Mayr, A. (2022) Deselection of base-learners for statistical boosting-with an application to distributional regression. Statistical Methods in Medical Research, 31(2), 207-224.
Thomas, J., Hepp, T., Mayr, A. & Bischl, B. (2017) Probing for sparse and fast variable selection with model-based boosting. Computational and Mathematical Methods in Medicine, 2017, 1-8.
Thomas, J., Mayr, A., Bischl, B., Schmid, M., Smith, A. & Hofner, B. (2018) Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates. Statistics and Computing, 28, 673-687.
Vatter, T. & Nagler, T. (2018) Generalized additive models for pair-copula constructions. Journal of Computational and Graphical Statistics, 27(4), 715-727.
Villar, J., Ismail, L.C., Victora, C.G., Ohuma, E.O., Bertino, E., Altman, D.G. et al. (2014) International standards for newborn weight, length, and head circumference by gestational age and sex: the newborn cross-sectional study of the INTERGROWTH-21st Project. The Lancet, 384(9946), 857-868.
Wood, S.N. (2017) Generalized additive models: an introduction with R, 2nd edition, London: Chapman & Hall.
Yan, J., Li, T., Wang, H., Huang, H., Wan, J., Nho, K., et al. (2015) Cortical surface biomarkers for predicting cognitive outcomes using group l2, 1 norm. Neurobiology of Aging, 36(Suppl 1), S185-S193.
Zhang, B., Hepp, T., Greven, S. & Bergherr, E. (2022) Adaptive step-length selection in gradient boosting for Gaussian location and scale models. Computational Statistics, 37, 2295-2332.

Auteurs

Nicolai Hans (N)

Chair of Statistics and Data Science, Humboldt-Universität zu Berlin, Berlin, Germany.

Nadja Klein (N)

Chair of Statistics and Data Science, Humboldt-Universität zu Berlin, Berlin, Germany.

Florian Faschingbauer (F)

Department of Obstetrics and Gynecology, University Hospital of Erlangen, Erlangen, Germany.

Michael Schneider (M)

Department of Obstetrics and Gynecology, University Hospital of Erlangen, Erlangen, Germany.

Andreas Mayr (A)

Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, University of Bonn, Bonn, Germany.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH