Clinical prediction models to predict the risk of multiple binary outcomes: a comparison of approaches.

binary outcomes clinical prediction model multiple outcomes multivariate modeling regression risk prediction

Journal

Statistics in medicine
ISSN: 1097-0258
Titre abrégé: Stat Med
Pays: England
ID NLM: 8215016

Informations de publication

Date de publication:
30 01 2021
Historique:
received: 21 01 2020
revised: 25 08 2020
accepted: 07 10 2020
pubmed: 28 10 2020
medline: 22 6 2021
entrez: 27 10 2020
Statut: ppublish

Résumé

Clinical prediction models (CPMs) can predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, there are many medical applications where two or more outcomes are of interest, meaning this should be more widely reflected in CPMs so they can accurately estimate the joint risk of multiple outcomes simultaneously. A potentially naïve approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop CPMs for multiple binary outcomes. We consider four methods, ranging in complexity and conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on conditional independence: separate univariate CPMs and stacked regression. Employing a simulation study and real-world example, we illustrate that CPMs for joint risk prediction of multiple outcomes should only be derived using methods that model the residual correlation between outcomes. In such a situation, our results suggest that probabilistic classification chains, multinomial logistic regression or the Bayesian probit model are all appropriate choices. We call into question the development of CPMs for each outcome in isolation when multiple correlated or structurally related outcomes are of interest and recommend more multivariate approaches to risk prediction.

Identifiants

pubmed: 33107066
doi: 10.1002/sim.8787
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

498-517

Subventions

Organisme : Medical Research Council
ID : MR/T025085/1
Pays : United Kingdom

Commentaires et corrections

Type : CommentIn
Type : CommentIn

Informations de copyright

© 2020 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Références

Steyerberg EW. Clinical Prediction Models. New York, NY: Springer; 2009.
Riley RD, Windt D, Croft P, Moons K. Prognosis Research in Healthcare: Concepts, Methods, and Impact. Oxford: Oxford University Press; 2019.
Harrell FE. Regression Modeling Strategies. 2nd ed. New York, NY: Springer; 2015.
Biesheuvel CJ, Vergouwe Y, Steyerberg EW, Grobbee DE, Moons KGM. Polytomous logistic regression analysis could be applied more often in diagnostic research. J Clin Epidemiol. 2008;61(2):125-134. https://doi.org/10.1016/j.jclinepi.2007.03.002.
Jong VMT, Eijkemans MJC, Calster B, et al. Sample size considerations and predictive performance of multinomial logistic prediction models. Stat Med. 2019;38(9):1601-1619. https://doi.org/10.1002/sim.8063.
Chong C-F, Li Y-C, Wang T-L, Chang H. Stratification of adverse outcomes by preoperative risk factors in coronary artery bypass graft patients: an artificial neural network prediction model. AMIA. Annu Symp proceedings AMIA Symp. 2003;2003:160-164.
Prins C, de Villiers Jonker I, Botes L, Smit FE. Cardiac surgery risk-stratification models. Cardiovasc J Afr. 2012;23(3):160-164. https://doi.org/10.5830/CVJA-2011-047.
Schuit E, Kwee A, Westerhuis M, et al. A clinical prediction model to assess the risk of operative delivery. BJOG An Int J Obstet Gynaecol. 2012;119(8):915-923. https://doi.org/10.1111/j.1471-0528.2012.03334.x.
Damen JAAG, Hooft L, Schuit E, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416. https://doi.org/10.1136/bmj.i2416.
Vickers AJ. Prediction models in cancer care. CA Cancer J Clin. 2011;61(5):315-326. https://doi.org/10.3322/caac.20118.
Echouffo-Tcheugui JB, Kengne AP. Risk models to predict chronic kidney disease and its progression: a systematic review. PLoS Med. 2012;9(11):e1001344. https://doi.org/10.1371/journal.pmed.1001344.
Bayliss EA, Bayliss MS, Ware JE, Steiner JF. Predicting declines in physical function in persons with multiple chronic medical conditions: what we can learn from the medical problem list. Health Qual Life Outcomes. 2004;2:47. https://doi.org/10.1186/1477-7525-2-47.
Fortin M, Lapointe L, Hudon C, Vanasse A, Ntetu AL, Maltais D. Multimorbidity and quality of life in primary care: a systematic review. Health Qual Life Outcomes. 2004;2:51. https://doi.org/10.1186/1477-7525-2-51.
Carey V, Zeger SL, Diggle P. Modelling multivariate binary data with alternating logistic regressions. Biometrika. 1993;80(3):517-526. https://doi.org/10.1093/biomet/80.3.517.
Breiman L, Friedman J. Predicting multivariate responses in multiple linear regression. J R Stat Soc B. 1997;59(1):3-54.
Kip KE, Hollabaugh K, Marroquin OC, Williams DO. The problem with composite end points in cardiovascular studies. The story of major adverse cardiac events and percutaneous coronary intervention. J Am Coll Cardiol. 2008;51(7):701-707. https://doi.org/10.1016/j.jacc.2007.10.034.
Chib S, Greenberg E. Analysis of multivariate probit models. Biometrika. 1998;85(2):347-361. https://doi.org/10.1093/biomet/85.2.347.
Teixeira-Pinto A, S-LT N. Correlated bivariate continuous and binary outcomes: issues and applications. Stat Med. 2009;28(13):1753-1773. https://doi.org/10.1002/sim.3588.
Putter H, Fiocco M, Gekus RB. Tutorial in biostatistics: competing risk and multi-state models. Stat Med. 2007;26(11):2389-2430. https://doi.org/10.1002/sim.2712.
Upshaw JN, Konstam MA, Van Klaveren D, Noubary F, Huggins GS, Kent DM. Multistate model to predict heart failure hospitalizations and all-cause mortality in outpatients with heart failure with reduced ejection fraction. Circ Hear Fail. 2016;9(8):e003146. https://doi.org/10.1161/CIRCHEARTFAILURE.116.003146.
Freisling H, Viallon V, Lennon H, et al. Lifestyle factors and risk of multimorbidity of cancer and cardiometabolic diseases: a multinational cohort study. BMC Med. 2020;18(1):5. https://doi.org/10.1186/s12916-019-1474-7.
Zhang ML, Zhou ZH. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. 2014;26(8):1819-1837. https://doi.org/10.1109/TKDE.2013.39.
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition. Vol 45. 2012;9:3084-3104. https://doi.org/10.1016/j.patcog.2012.03.004.
Zhang ML, Li YK, Liu XY, Geng X. Binary relevance for multi-label learning: an overview. Front Comput Sci. 2018;12(2):191-202. https://doi.org/10.1007/s11704-017-7031-7.
Read J, Pfahringer B, Holmes G, Frank E, Classifier Chains for Multi-label Classification. Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J. Machine Learning and Knowledge Discovery in Databases. In: Springer, Berlin, Germany; 2009;5782:254-269.
Zhang ML, Zhou ZH. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng. 2006;18(10):1338-1351. https://doi.org/10.1109/TKDE.2006.162.
Dudbridge F. Criteria for evaluating risk prediction of multiple outcomes. Stat Methods Med Res. 2020;29:3492-3510. https://doi.org/10.1177/0962280220929039.
Riley RD, Snell KI, Ensor J, et al. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med. 2019;38(7):1276-1296. https://doi.org/10.1002/sim.7992.
Pavlou M, Ambler G, Seaman SR, et al. How to develop a more accurate risk prediction model when there are few events. BMJ. 2015;351:h3868. https://doi.org/10.1136/BMJ.H3868.
Park T, Casella G. The Bayesian lasso. J Am Stat Assoc. 2008;103(482):681-686. https://doi.org/10.1198/016214508000000337.
Debray TPAA, Koffijberg H, Vergouwe Y, Moons KGMM, Steyerberg EW, Steyerberg EW. Aggregating published prediction models with individual participant data: a comparison of different approaches. Stat Med. 2012;31(23):2697-2712. https://doi.org/10.1002/sim.5412.
Martin GP, Mamas MA, Peek N, Buchan I, Sperrin M. A multiple-model generalisation of updating clinical prediction models. Stat Med. 2018;37(8):1343-1358. https://doi.org/10.1002/sim.7586.
Xing L, Lesperance M, Zhang X. Simultaneous prediction of multiple outcomes using revised stacking algorithms. Hancock J, Ed. Bioinformatics. 2019;36:65-72. https://doi.org/10.1093/bioinformatics/btz531.
Debray TPA, Koffijberg H, Nieboer D, Vergouwe Y, Steyerberg EW, Moons KGM. Meta-analysis and aggregation of multiple published prediction models. Stat Med. 2014;33(14):2341-2362. https://doi.org/10.1002/sim.6080.
Martin GP, Mamas MA, Peek N, Buchan I, Sperrin M. Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models. BMC Med Res Methodol. 2017;17(1):1. https://doi.org/10.1186/s12874-016-0277-1.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996;58(1):267-288.
Dembczynski K, Cheng W, Hullermeier E. Bayes optimal multilabel classification via probabilistic classifier chains. Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel: Omnipress; 2010:279-286.
Venables WN, Ripley BD. Modern Applied Statistics with S. 4th ed. New York, NY: Springer; 2002.
Gauvreau K, Pagano M. The analysis of correlated binary outcomes using multivariate logistic regression. Biometrical J. 1997;39(3):309-325. https://doi.org/10.1002/bimj.4710390306.
Gumbel EJ. Bivariate logistic distributions. J Am Stat Assoc. 1961;56(294):335-349. https://doi.org/10.1080/01621459.1961.10482117.
Nikoloulopoulos AK. Copula-Based Models for Multivariate Discrete Response Data. Berlin, Germany: Springer; 2013:231-249.
Genest C, Nikoloulopoulos AK, Rivest LP, Fortin M. Predicting dependent binary outcomes through logistic regressions and meta-elliptical copulas. Brazilian J Probab Stat. 2013;27(3):265-284. https://doi.org/10.1214/11-BJPS165.
Stefanescu C, Turnbull BW. On the multivariate Probit model for exchangeable binary data with covariates. Biometrical J. 2005;47(2):206-218. https://doi.org/10.1002/bimj.200410101.
Edwards YD, Allenby GM. Multivariate analysis of multiple response data. J Mark Res. 2003;40(3):321-334.
Burke DL, Bujkiewicz S, Riley RD. Bayesian bivariate meta-analysis of correlated effects: impact of the prior distributions on the between-study correlation, borrowing of strength, and joint inferences. Stat Methods Med Res. 2018;27(2):428-450. https://doi.org/10.1177/0962280216631361.
Plummer M. rjags: Bayesian Graphical Models using MCMC. 2018. https://cran.r-project.org/package=rjags.
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074-2102. https://doi.org/10.1002/sim.8086.
Emrich LJ, Piedmonte MR. A method for generating high-dimensional multivariaten binary variates. Am Stat. 1991;45(4):302-304. https://doi.org/10.1080/00031305.1991.10475828.
Touloumis A. Simulating correlated binary and multinomial responses under marginal model specification: the SimCorMultRes package. R J. 2016;8(2):79-91. journal.r-project.org/archive/2016/RJ-2016-034/RJ-2016-034.pdf.
Ted Li S, Hammond JL. Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Trans Syst Man Cybern. 1975;SMC-5(5):557-561. https://doi.org/10.1109/TSMC.1975.5408380.
Cox D. Two further applications of a model for binary regression. Biometrika. 1958;45(3):562-565.
Van Hoorde K, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW, Van Calster B. Assessing calibration of multinomial risk prediction models. Stat Med. 2014;33(15):2585-2596. https://doi.org/10.1002/sim.6114.
Van Hoorde K, Van Huffel S, Timmerman D, Bourne T, Van Calster B. A spline-based tool to assess and visualize the calibration of multiclass risk predictions. J Biomed Inform. 2015;54:283-293. https://doi.org/10.1016/j.jbi.2014.12.016.
Van Calster B, Vergouwe Y, Looman CWN, Van Belle V, Timmerman D, Steyerberg EW. Assessing the discriminative ability of risk models for more than two outcome categories. Eur J Epidemiol. 2012;27(10):761-770. https://doi.org/10.1007/s10654-012-9733-3.
Van Calster B, Van Belle V, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW. Extending the c-statistic to nominal polytomous outcomes: the polytomous discrimination index. Stat Med. 2012;31(23):2610-2626. https://doi.org/10.1002/sim.5321.
R Core Team. R: A Language and Environment for Statistical Computing. Team RDC, ed. Vienna, Austria: R Foundation for Statistical Computing; 2020. https://www.R-project.org/.
Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel D, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. Welcome to the Tidyverse. Journal of Open Source Software. 2019;4(43):1686. http://dx.doi.org/10.21105/joss.01686.
Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011;12:77.
Plummer M, Best N, Cowles K, Vines K. CODA: convergence diagnosis and output analysis for MCMC. R News. 2006;6(1):7-11.
CRAN. Fortran code by Genz A, R code by Kenkel B. pbivnorm: Vectorized Bivariate Normal CDF. 2015. https://cran.r-project.org/package=pbivnorm.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1-22. https://doi.org/10.18637/jss.v033.i01.
Yee TW. Vector Generalized Linear and Additive Models: With an Implementation in R. New York, NY: Springer; 2015.
Yee TW, Wild CJ. Vector generalized additive models. J R Stat Soc Ser B. 1996;58(3):481-493.
Yee TW. The VGAM package for categorical data analysis. J Stat Softw. 2010;32(10):1-34. http://www.jstatsoft.org/v32/i10/.
Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):160035. https://doi.org/10.1038/sdata.2016.35.
Gentimis T, Alnaser AJ, Durante A, Cook K, Steele R. Predicting hospital length of stay using neural networks on MIMIC III data. Paper presented at: Proceedings of 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 2017 IEEE 15th International Conference on Pervasive Intelligence and Computing, 2017 IEEE 3rd International Conference on Big Data Intelligence and Computing. Vol 2018; IEEE; 2018:1194-1201. doi:https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.191
National Institute for Health and Care Excellence. Acute Kidney Injury: Prevention, detection and management up to the point of renal replacement therapy. London, UK: Royal College of Physicians; 2013.
Levey AS, Coresh J, Greene T, et al. Using standardized serum creatinine values in the modification of diet in renal disease study equation for estimating glomerular filtration rate. Ann Intern Med. 2006;145(4):247-254. https://doi.org/10.7326/0003-4819-145-4-200608150-00004.
Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons; 1987.
Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338(1):b2393-b2393. https://doi.org/10.1136/bmj.b2393.
Charlson ME, Pompei P, Ales KL, CR MK. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373-383.
Riley RD, Price MJ, Jackson D, et al. Multivariate meta-analysis using individual participant data. Res Synth Methods. 2015;6(2):157-174. https://doi.org/10.1002/jrsm.1129.
Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R. Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues. BMC Med Res Methodol. 2016;16(1):1-15. https://doi.org/10.1186/s12874-016-0212-5.
Rizopoulos D, Molenberghs G, Lesaffre EMEH. Dynamic predictions with time-dependent covariates in survival analysis using joint modeling and landmarking. Biometrical J. 2017;59(6):1261-1276. https://doi.org/10.1002/bimj.201600238.
Dunson DB. Bayesian latent variable models for clustered mixed outcomes. J R Stat Soc Ser B Stat Methodol. 2000;62(2):355-366. https://doi.org/10.1111/1467-9868.00236.
De Leon AR, Wu B. Copula-based regression models for a bivariate mixed discrete and continuous outcome. Stat Med. 2011;30(2):175-185. https://doi.org/10.1002/sim.4087.

Auteurs

Glen P Martin (GP)

Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK.

Matthew Sperrin (M)

Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK.

Kym I E Snell (KIE)

Centre for Prognosis Research, School of Primary, Community and Social Care, Keele University, Staffordshire, UK.

Iain Buchan (I)

Institute of Population Health Sciences, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, UK.

Richard D Riley (RD)

Centre for Prognosis Research, School of Primary, Community and Social Care, Keele University, Staffordshire, UK.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH