All that Glitters Is not Gold: Type-I Error Controlled Variable Selection from Clinical Trial Data.
Journal
Clinical pharmacology and therapeutics
ISSN: 1532-6535
Titre abrégé: Clin Pharmacol Ther
Pays: United States
ID NLM: 0372741
Informations de publication
Date de publication:
Apr 2024
Apr 2024
Historique:
received:
14
09
2023
accepted:
02
02
2024
pubmed:
29
2
2024
medline:
29
2
2024
entrez:
29
2
2024
Statut:
ppublish
Résumé
Clinical trials are primarily conducted to estimate causal effects, but the data collected can also be invaluable for additional research, such as identifying prognostic measures of disease or biomarkers that predict treatment efficacy. However, these exploratory settings are prone to false discoveries (type-I errors) due to the multiple comparisons they entail. Unfortunately, many methods fail to address this issue, in part because the algorithms used are generally designed to optimize predictions and often only provide the measures used for variable selection, such as machine learning model importance scores, as a byproduct. To address the resulting unclear uncertainty in the selection sets, the knockoff framework offers a model-agnostic, robust approach to variable selection with guaranteed type-I error control. Here, we review the knockoff framework in the setting of clinical data, highlighting main considerations using simulation studies. We also extend the framework by introducing a novel knockoff generation method that addresses two main limitations of previously suggested methods relevant for clinical development settings. With this new method, we empirically obtain tighter bounds on type-I error control and gain an order of magnitude in computational efficiency in mixed data settings. We demonstrate comparable selections to those of the competing method for identifying prognostic biomarkers for C-reactive protein levels in patients with psoriatic arthritis in four clinical trials. Our work increases access to the knockoff framework for variable selection from clinical trial data. Hereby, this paper helps to address the current replicability crisis which can result in unnecessary research efforts, increased patient burden, and avoidable costs.
Types de publication
Journal Article
Review
Langues
eng
Sous-ensembles de citation
IM
Pagination
774-785Informations de copyright
© 2024 Novartis. Clinical Pharmacology & Therapeutics © 2024 American Society for Clinical Pharmacology and Therapeutics.
Références
Van Lancker, K., Bretz, F. & Dukes, O. The use of covariate adjustment in randomized controlled trials: An overview, arXiv preprint arXiv:2306.05823 (2023).
Lipkovich, I., Dmitrienko, A. & D'Agostino Sr, R.B. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Stat. Med. 36, 136-196 (2017).
Berry, D. Multiplicities in cancer research: ubiquitous and necessary evils. J. Natl. Cancer Inst. 104, 1125-1133 (2012).
Kent, D.M., Steyerberg, E. & van Klaveren, D. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects. BMJ, 363, k4245 (2018).
Burke, J.F., Sussman, J.B., Kent, D.M. & Hayward, R.A. Three simple rules to ensure reasonably credible subgroup analyses. BMJ. 351, h5651 (2015).
Ioannidis, J.P.A. Why most published research findings are false. PLoS Med. 2, e124 (2005).
Ioannidis, J.P.A. Contradicted and initially stronger effects in highly cited clinical research. JAMA 294, 218 (2005).
Liu, Q. et al. Landscape analysis of the application of artificial intelligence and machine learning in regulatory submissions for drug development from 2016 to 2021. Clin. Pharmacol. Ther. 113, 771-774 (2023).
Barber, R.F. & Candés, E.J. Controlling the false discovery rate via knockoffs. Ann. Stat. 43, 2055-2085 (2015).
Candès, E., Fan, Y., Janson, L. & Lv, J. Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection. J. R. Stat. Soc. Ser. B Stat Methodol. 80, 551-577 (2018).
Jiang, T., Li, Y. & Motsinger-Reif, A.A. Knockoff boosted tree for model-free variable selection. Bioinformatics 37, 976-983 (2020).
Kırboğa, K.K., Abbasi, S. & Küçüksille, E.U. Explainability and white box in drug discovery. Chem. Biol. Drug Des. 102, 217-233 (2023).
Kormaksson, M., Kelly, L.J., Zhu, X., Haemmerle, S., Pricop, L. & Ohlssen, D. Sequential knockoffs for continuous and categorical predictors: with application to a large psoriatic arthritis clinical trial pool. Stat. Med. 40, 3313-3328 (2021).
Sechidis, K., Kormaksson, M. & Ohlssen, D. Using knockoffs for controlled predictive biomarker identification. Stat. Med. 40, 5453-5473 (2021).
Sesia, M., Sabatti, C. & Candès, E.J. Gene hunting with hidden Markov model knockoffs. Biometrika 106, 1-18 (2018).
Candès, E. & Sesia, M. Variable selection with knockoffs, <https://web.stanford.edu/group/candes/knockoffs> Accessed December 6, 2023.
Spector, A. & Janson, L. Powerful knockoffs via minimizing reconstructability. Ann. Stat. 50, 252-276 (2022).
Romano, Y., Sesia, M. & Candès, E. Deep knockoffs. J. Am. Stat. Assoc., 115, 1861-1872 (2020). https://doi.org/10.1080/01621459.2019.1660174
Bates, S., Candès, E., Janson, L. & Wang, W. Metropolized knockoff sampling. J. Am. Stat. Assoc., 116, 1413-1427 (2021). https://doi.org/10.1080/01621459.2020.1729163
Jordon, J., Yoon, J. & Van Der Schaar, M. Knockoff GAN: generating knockoffs for feature selection using generative adversarial networks. 7th International Conference on Learning Representations, ICLR 2019 (2019) pp. 1-25.
Kormaksson, M., Sechidis, K. & Zimmermann, M. Knockofftools, GitHub repository <https://github.com/Novartis/knockofftools> Accessed December 20, 2023.
Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432-441 (2007).
Lee, J.D. & Hastie, T.J. Learning the structure of mixed graphical models. J. Comput. Graph. Stat. 24, 230-253 (2015).
Haslbeck, J.M.B. & Waldorp, L.J. mgm: Estimating time-varying mixed graphical models in high-dimensional data. J. Stat. Softw. 93, 1-46 (2020). https://doi.org/10.18637/jss.v093.i08
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B. Methodol. 58, 267-288 (1996).
Breiman, L. Random forests. Mach. Learn. 45, 5-32 (2001).
Scott, M. Lundberg and Su-in Lee. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17 4768-4777 (Curran Associates Inc, Red Hook, NY, USA, 2017).
Ren, Z., Wei, Y. & Candès, E. Derandomizing knockoffs. J. Am. Stat. Assoc. 118, 948-958 (2021).
Janson, L. & Weijie, S. Familywise error rate control via knockoffs. Electronic J. Stat. 10, 960-975 (2016).
Ren, Z. & Barber, R.F. Derandomised knockoffs: leveraging e-values for false discovery rate control. J. R. Stat. Soc. Ser. B Stat Methodol. 00, 1-33 (2023).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1-22 (2010).
Ishwaran, H. & Kogalur, U.B. Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) (2023) R package version 3.2.0.
Zimmermann, M., Sechidis, K. & Kormaksson, M. GitHub repository, will be made publicly available upon acceptance, <https://github.com/Novartis/knockoffs-cpt2024paper-simulations>
Scutari, M. Learning bayesian networks with the bnlearn r package. J. Stat. Softw. 35, 1-22 (2010).
McInnes, I.B. et al. Secukinumab, a human anti-interleukin-17a monoclonal antibody, in patients with psoriatic arthritis (FUTURE 2): a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet 386, 1137-1146 (2015).
Nash, P. et al. Efficacy and safety of secukinumab administration by autoinjector in patients with psoriatic arthritis: results from a randomized, placebo-controlled trial (FUTURE 3). Arthr. Res. Ther. 20, 1-11 (2018).
Kivitz, A.J. et al. Efficacy and safety of subcutaneous secukinumab 150 mg with or without loading regimen in psoriatic arthritis: results from the FUTURE 4 study. Rheumatol. Ther. 6, 393-407 (2019).
Mease, P. et al. Secukinumab improves active psoriatic arthritis symptoms and inhibits radiographic progression: primary results from the randomised, double-blind, phase III FUTURE 5 study. Ann. Rheum. Dis., 890-897 (2018).
Zhang, Z., Seibold, H., Vettore, M.V., Song, W.-J. & François, V. Subgroup identification in clinical trials: an overview of available methods and their implementations with r. Ann. Trans. Med. 6, 122 (2018).
Barber, R.F., Candès, E.J. & Samworth, R.J. Robust inference with knockoffs. Ann. Stat. 48, 1409-1431 (2020).
Ritchlin, C.T., Colbert, R.A. & Gladman, D.D. Psoriatic arthritis. N. Engl. J. Med. 376, 957-970 (2017).
Ogdie, A. et al. Usage of c-reactive protein testing in the diagnosis and monitoring of psoriatic arthritis (PsA): results from a real-world survey in the USA and europe. Rheumatol. Ther. 9, 285-293 (2022).
Bühlmann, P., Rütimann, P., van de Geer, S. & Zhang, C.H. Correlated variables in regression: clustering and sparse estimation. J. Stat. Plan. Inference 143, 1835-1858 (2013).
Vásquez, A.R., Urbina, J.U.M., Farías, G.G. & Escarela, G. Controlling the false discovery rate by a latent gaussian copula knockoff procedure. Comput. Stat. (2023). https://doi.org/10.1007/s00180-023-01346-4
Huang, D. & Janson, L. Relaxing the assumptions of knockoffs by conditioning. Ann. Stat. 48, 3021-3042 (2020).
Koyuncu, D. & Yener, B. Missing value knockoffs, arXiv preprint arXiv:2202.13054 (2022).
Wang, R., Dai, R. & Zheng, C. Controlling FDR in selecting group-level simultaneous signals from multiple data sources with application to the national covid collaborative cohort data, arXiv preprint arXiv:2303.01599 (2023).