Semiparametric analysis of clustered interval-censored survival data using soft Bayesian additive regression trees (SBART).
Bayesian additive regression trees
machine learning
nonproportional hazards
semiparametric
survival analysis
Journal
Biometrics
ISSN: 1541-0420
Titre abrégé: Biometrics
Pays: United States
ID NLM: 0370625
Informations de publication
Date de publication:
09 2022
09 2022
Historique:
revised:
10
03
2021
received:
04
05
2020
accepted:
01
04
2021
pubmed:
18
4
2021
medline:
5
10
2022
entrez:
17
4
2021
Statut:
ppublish
Résumé
Popular parametric and semiparametric hazards regression models for clustered survival data are inappropriate and inadequate when the unknown effects of different covariates and clustering are complex. This calls for a flexible modeling framework to yield efficient survival prediction. Moreover, for some survival studies involving time to occurrence of some asymptomatic events, survival times are typically interval censored between consecutive clinical inspections. In this article, we propose a robust semiparametric model for clustered interval-censored survival data under a paradigm of Bayesian ensemble learning, called soft Bayesian additive regression trees or SBART (Linero and Yang, 2018), which combines multiple sparse (soft) decision trees to attain excellent predictive accuracy. We develop a novel semiparametric hazards regression model by modeling the hazard function as a product of a parametric baseline hazard function and a nonparametric component that uses SBART to incorporate clustering, unknown functional forms of the main effects, and interaction effects of various covariates. In addition to being applicable for left-censored, right-censored, and interval-censored survival data, our methodology is implemented using a data augmentation scheme which allows for existing Bayesian backfitting algorithms to be used. We illustrate the practical implementation and advantages of our method via simulation studies and an analysis of a prostate cancer surgery study where dependence on the experience and skill level of the physicians leads to clustering of survival times. We conclude by discussing our method's applicability in studies involving high-dimensional data with complex underlying associations.
Types de publication
Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
880-893Subventions
Organisme : CSR NIH HHS
ID : R03CA205018-01
Pays : United States
Informations de copyright
© 2021 The International Biometric Society.
Références
Adams, R.P., Murray, I. and MacKay, D.J.C. (2009) Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. Proceedings of the 26th International Conference on Machine Learning (ICML).
Albert, J.H. and Chib, S. (1993) Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669-679.
Barbash, G. I. and Glied, S.A. (2010) New technology and health care costs: the case of robot-assisted surgery. New England Journal of Medicine, 363(8), 701-704.
Bonato, V., Baladandayuthapani, V., Broom, M.B., Sulman, E.P., Aldape, K.D. and Do, K. A. (2011) Bayesian ensemble methods for survival prediction in gene expression data. Bioinformatics, 27(3), 359-367.
Calhoun, P., Su, X., Nunn, M. and Fan, J. (2018) Constructing multivariate survival trees: the MST package for R. Journal of Statitical Software, 83, 1-21.
Chipman, H.A., George, E.I. and McCulloch, R.E. (2010) BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4, 266-298.
Conkin, J., Bedahl, S. R. and van Liew H. D., (1992) A computerized data bank of decompression sickness incidence in altitude chambers. Aviation, Space and Environmental Medicine, 63, 819-824.
Conkin, J. and Powell, M. (2001) Lower body adynamia as a factor to reduce the risk of hypobaric decompression sickness. Aviation, Space and Environmental Medicine, 72, 202-214.
De Iorio, M., Johnson, W.O., Muller, P. and Rosner, G.L. (2009) Bayesian nonparametric nonproportional hazards survival modeling. Biometrics, 65, 762-771.
Deshpande, S.K., Bai, R., Balocchi, C. and Starling, J.E. (2020) VC-BART: Bayesian trees for varying coefficients. arXiv:2003.06416.
Du, J. and Linero, A.R. (2019) Interaction detection with Bayesian decision tree ensembles. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89, pp. 108-117.
Dreyer, G., Addiss, D., Williamson, J. and Noroes, J. (2006) Efficacy of co-administered diethylcarbamazine and albendazole against adult Wuchereria bancrofti. Transactions of the Royal Society of Tropical Medicine and Hygiene, 100, 1118-1125.
Fernandez, T., Rivera, N. and Teh, Y.W. (2016) Gaussian processes for survival analysis. Proceedings of the 30th International Conference on Neural Information Processing Systems, 16, pp. 5021-5029.
Friedman, J. (1991) Multivariate adaptive regression splines. Annals of Statistics, 19, 1-141.
Goethals, K., Ampe, B., Berkvens, D., Laevens, H., Janssen, P. and Duchateau, L. (2009) Modeling interval-censored, clustered cow udder quarter infection times through the shared gamma frailty model. Journal of Agricultural, Biological and Environmental Statistics, 14, 1-14.
Hahn, P.R., Murray, J.S. and Carvalho, C.M. (2020) Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Bayesian Analysis. To appear.
Hanson, T. and Yang, M. (2007) Bayesian semiparametric proportional Oodds models. Biometrics, 63, 88-95.
Henschel, V., Engel, J., Hölzel, D. and Mansmann, U. (2009) A semiparametric Bayesian proportional hazards model for interval censored data with frailty effects. BMC Medical Research Methodology, 9, 9.
Hill, J., Linero, A.R. and Murray, J.S. (2020) Bayesian additive regression trees: a review and look forward. Annual Review of Statistics and its Application, 7(1), 251-278.
Hothorn, T., Lausen, B. and Benner, A. (2004) Bagging survival trees. Statistics in Medicine, 23(1), 77-91.
Hougaard, P. (1995) Frailty models for survival data. Lifetime Data Analysis, 1(3), 255-273.
Ibrahim, R., L'Ecuyer, P., Regnard, N. and Shen, H. (2012) On the modeling and forecasting of call center arrivals. Proceedings of 2012 Winter Simulation Conference, Berlin, 256-267.
Ishwaran, H., Kogalur, U.B., Blackstone, E.H. and Lauer, M.S. (2008) Random survival forests. The Annals of Applied Statistics, 2(3), 841-860.
Kalbfleisch, J.D. and Prentice, R.L. (2002) The Statistical Analysis of Failure Time Data, 2nd edition. Hoboken: John Wiley & Sons.
Kooperberg, C. and Clarkson, D.B. (1997) Hazard regression with interval censored data. Biometrics, 53, 1485-1494.
Li, Y., Linero, A.R. and Murray, J.S. Adaptive Conditional Distribution Estimation with Bayesian Decision Tree Ensembles. arXiv e-prints.
Li, H. and Luan, Y. (2005) Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics, 21(10), 2403-2409.
Linero, A.R. (2017) A review of tree-based Bayesian methods. Communications for Statistical Applications and Methods, 24(6), 543-559.
Linero, A.R. (2018) Bayesian regression tree ensembles that adapt to smoothness and sparsity. Journal of the American Statistical Association, 113(522), 626-636.
Linero, A.R., Basak, P., Li, Y. and Sinha, D. (2021) Bayesian Survival Tree Ensembles with Submodel Shrinkage. arXiv e-prints.
Linero, A.R., Sinha, D. and Lipsitz, S.R. (2020) Semiparametric mixed-scale models using shared Bayesian forests. Biometrics, 76(1), 131-144.
Linero, A.R. and Yang, Y. (2018) Bayesian regression tree ensembles that adapt to smoothness and sparsity. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 80(5), 1087-1110.
Mallick, B.K., Denison, D.G.T. and Smith, A.F.M. (1999) Bayesian survival analysis using a MARS model. Biometrics, 55, 1071-1077.
Murray, J.S. (2017) Log-linear Bayesian additive regression trees for categorical and count responses. arXiv preprint arxiv:1701.01503.
Neal, R.M. (2003) Slice sampling. The Annals of Statistics, 31, 705-767.
Oakes, D.R. (1982) A model for association in bivariate survival data. Journal of the Royal Statistical Society, Series B, 44, 414-428.
Pratola, M., Chipman, H., George, E. and McCulloch, R. (2017) Heteroscedastic BART using multiplicative regression trees. arXiv preprint arXiv:1709.07542.
Sinha, D.Chen, M-H, and Ghosh, S. (1999) Bayesian analysis and model selection for interval-censored survival data. Biometrics, 55, 585-590.
Sparapani, R., Logan, B.R., McCulloch, R.E. and Laud, P.W. (2016) Nonparametric survival analysis using Bayesian Additive Regression Trees (BART). Statistics in Medicine.
Su, X. and Tsai, C.-L. (2005) Tree-augmented Cox proportional hazards models. Biostatistics, 6, 486-499.
Su, X., Zhou, T., Yan, X. and Fan, J. (2008) Interaction trees with censored survival data. International Journal of Biostatistics, 4, 1-26.
Sun, J. (2006) The Statistical Analysis of Interval-Censored Failure Time Data. Berlin: Springer.
Umlauf, N., Adler, D., Kneib, T., Lang, S. and Zeileis, A. (2015) Structured additive regression models: an R interface to BayesX. Journal of Statistical Software, 63, 1-46.
Zhou, H., Hanson, T. and Zhang, J. (2017) Generalized accelerated failure time spatial frailty model for arbitrarily censored data. Lifetime Data Analysis, 23, 495-515.
Zhou, H., Hanson, T. and Zhang, J. (2020) spBayesSurv: fitting Bayesian spatial survival models using R. Journal of Statistical Software, 92, 1-33.