All that Glitters Is not Gold: Type-I Error Controlled Variable Selection from Clinical Trial Data.

Journal

Clinical pharmacology and therapeutics

ISSN: 1532-6535

Titre abrégé: Clin Pharmacol Ther

Pays: United States

ID NLM: 0372741

Informations de publication

Date de publication:
Apr 2024

Historique:

received: 14 09 2023

accepted: 02 02 2024

pubmed: 29 2 2024

medline: 29 2 2024

entrez: 29 2 2024

Statut: ppublish

Résumé

Clinical trials are primarily conducted to estimate causal effects, but the data collected can also be invaluable for additional research, such as identifying prognostic measures of disease or biomarkers that predict treatment efficacy. However, these exploratory settings are prone to false discoveries (type-I errors) due to the multiple comparisons they entail. Unfortunately, many methods fail to address this issue, in part because the algorithms used are generally designed to optimize predictions and often only provide the measures used for variable selection, such as machine learning model importance scores, as a byproduct. To address the resulting unclear uncertainty in the selection sets, the knockoff framework offers a model-agnostic, robust approach to variable selection with guaranteed type-I error control. Here, we review the knockoff framework in the setting of clinical data, highlighting main considerations using simulation studies. We also extend the framework by introducing a novel knockoff generation method that addresses two main limitations of previously suggested methods relevant for clinical development settings. With this new method, we empirically obtain tighter bounds on type-I error control and gain an order of magnitude in computational efficiency in mixed data settings. We demonstrate comparable selections to those of the competing method for identifying prognostic biomarkers for C-reactive protein levels in patients with psoriatic arthritis in four clinical trials. Our work increases access to the knockoff framework for variable selection from clinical trial data. Hereby, this paper helps to address the current replicability crisis which can result in unnecessary research efforts, increased patient burden, and avoidable costs.

Identifiants

DOI: 10.1002/cpt.3211 PMID: 38419357

pubmed: 38419357

doi: 10.1002/cpt.3211

doi:

Types de publication

Journal Article Review

Langues

eng

Sous-ensembles de citation

Pagination

774-785

Informations de copyright

Références

Van Lancker, K., Bretz, F. & Dukes, O. The use of covariate adjustment in randomized controlled trials: An overview, arXiv preprint arXiv:2306.05823 (2023).

Lipkovich, I., Dmitrienko, A. & D'Agostino Sr, R.B. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Stat. Med. 36, 136-196 (2017).

Berry, D. Multiplicities in cancer research: ubiquitous and necessary evils. J. Natl. Cancer Inst. 104, 1125-1133 (2012).

Kent, D.M., Steyerberg, E. & van Klaveren, D. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects. BMJ, 363, k4245 (2018).

Burke, J.F., Sussman, J.B., Kent, D.M. & Hayward, R.A. Three simple rules to ensure reasonably credible subgroup analyses. BMJ. 351, h5651 (2015).

Ioannidis, J.P.A. Why most published research findings are false. PLoS Med. 2, e124 (2005).

Ioannidis, J.P.A. Contradicted and initially stronger effects in highly cited clinical research. JAMA 294, 218 (2005).

Liu, Q. et al. Landscape analysis of the application of artificial intelligence and machine learning in regulatory submissions for drug development from 2016 to 2021. Clin. Pharmacol. Ther. 113, 771-774 (2023).

Barber, R.F. & Candés, E.J. Controlling the false discovery rate via knockoffs. Ann. Stat. 43, 2055-2085 (2015).

Candès, E., Fan, Y., Janson, L. & Lv, J. Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection. J. R. Stat. Soc. Ser. B Stat Methodol. 80, 551-577 (2018).

Jiang, T., Li, Y. & Motsinger-Reif, A.A. Knockoff boosted tree for model-free variable selection. Bioinformatics 37, 976-983 (2020).

Kırboğa, K.K., Abbasi, S. & Küçüksille, E.U. Explainability and white box in drug discovery. Chem. Biol. Drug Des. 102, 217-233 (2023).

Kormaksson, M., Kelly, L.J., Zhu, X., Haemmerle, S., Pricop, L. & Ohlssen, D. Sequential knockoffs for continuous and categorical predictors: with application to a large psoriatic arthritis clinical trial pool. Stat. Med. 40, 3313-3328 (2021).

Sechidis, K., Kormaksson, M. & Ohlssen, D. Using knockoffs for controlled predictive biomarker identification. Stat. Med. 40, 5453-5473 (2021).

Sesia, M., Sabatti, C. & Candès, E.J. Gene hunting with hidden Markov model knockoffs. Biometrika 106, 1-18 (2018).

Candès, E. & Sesia, M. Variable selection with knockoffs, <https://web.stanford.edu/group/candes/knockoffs> Accessed December 6, 2023.

Spector, A. & Janson, L. Powerful knockoffs via minimizing reconstructability. Ann. Stat. 50, 252-276 (2022).

Romano, Y., Sesia, M. & Candès, E. Deep knockoffs. J. Am. Stat. Assoc., 115, 1861-1872 (2020). https://doi.org/10.1080/01621459.2019.1660174

Bates, S., Candès, E., Janson, L. & Wang, W. Metropolized knockoff sampling. J. Am. Stat. Assoc., 116, 1413-1427 (2021). https://doi.org/10.1080/01621459.2020.1729163

Jordon, J., Yoon, J. & Van Der Schaar, M. Knockoff GAN: generating knockoffs for feature selection using generative adversarial networks. 7th International Conference on Learning Representations, ICLR 2019 (2019) pp. 1-25.

Kormaksson, M., Sechidis, K. & Zimmermann, M. Knockofftools, GitHub repository <https://github.com/Novartis/knockofftools> Accessed December 20, 2023.

Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432-441 (2007).

Lee, J.D. & Hastie, T.J. Learning the structure of mixed graphical models. J. Comput. Graph. Stat. 24, 230-253 (2015).

Haslbeck, J.M.B. & Waldorp, L.J. mgm: Estimating time-varying mixed graphical models in high-dimensional data. J. Stat. Softw. 93, 1-46 (2020). https://doi.org/10.18637/jss.v093.i08

Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B. Methodol. 58, 267-288 (1996).

Breiman, L. Random forests. Mach. Learn. 45, 5-32 (2001).

Scott, M. Lundberg and Su-in Lee. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17 4768-4777 (Curran Associates Inc, Red Hook, NY, USA, 2017).

Ren, Z., Wei, Y. & Candès, E. Derandomizing knockoffs. J. Am. Stat. Assoc. 118, 948-958 (2021).

Janson, L. & Weijie, S. Familywise error rate control via knockoffs. Electronic J. Stat. 10, 960-975 (2016).

Ren, Z. & Barber, R.F. Derandomised knockoffs: leveraging e-values for false discovery rate control. J. R. Stat. Soc. Ser. B Stat Methodol. 00, 1-33 (2023).

Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1-22 (2010).

Ishwaran, H. & Kogalur, U.B. Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) (2023) R package version 3.2.0.

Zimmermann, M., Sechidis, K. & Kormaksson, M. GitHub repository, will be made publicly available upon acceptance, <https://github.com/Novartis/knockoffs-cpt2024paper-simulations>

Scutari, M. Learning bayesian networks with the bnlearn r package. J. Stat. Softw. 35, 1-22 (2010).

McInnes, I.B. et al. Secukinumab, a human anti-interleukin-17a monoclonal antibody, in patients with psoriatic arthritis (FUTURE 2): a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet 386, 1137-1146 (2015).

Nash, P. et al. Efficacy and safety of secukinumab administration by autoinjector in patients with psoriatic arthritis: results from a randomized, placebo-controlled trial (FUTURE 3). Arthr. Res. Ther. 20, 1-11 (2018).

Kivitz, A.J. et al. Efficacy and safety of subcutaneous secukinumab 150 mg with or without loading regimen in psoriatic arthritis: results from the FUTURE 4 study. Rheumatol. Ther. 6, 393-407 (2019).

Mease, P. et al. Secukinumab improves active psoriatic arthritis symptoms and inhibits radiographic progression: primary results from the randomised, double-blind, phase III FUTURE 5 study. Ann. Rheum. Dis., 890-897 (2018).

Zhang, Z., Seibold, H., Vettore, M.V., Song, W.-J. & François, V. Subgroup identification in clinical trials: an overview of available methods and their implementations with r. Ann. Trans. Med. 6, 122 (2018).

Barber, R.F., Candès, E.J. & Samworth, R.J. Robust inference with knockoffs. Ann. Stat. 48, 1409-1431 (2020).

Ritchlin, C.T., Colbert, R.A. & Gladman, D.D. Psoriatic arthritis. N. Engl. J. Med. 376, 957-970 (2017).

Ogdie, A. et al. Usage of c-reactive protein testing in the diagnosis and monitoring of psoriatic arthritis (PsA): results from a real-world survey in the USA and europe. Rheumatol. Ther. 9, 285-293 (2022).

Bühlmann, P., Rütimann, P., van de Geer, S. & Zhang, C.H. Correlated variables in regression: clustering and sparse estimation. J. Stat. Plan. Inference 143, 1835-1858 (2013).

Vásquez, A.R., Urbina, J.U.M., Farías, G.G. & Escarela, G. Controlling the false discovery rate by a latent gaussian copula knockoff procedure. Comput. Stat. (2023). https://doi.org/10.1007/s00180-023-01346-4

Huang, D. & Janson, L. Relaxing the assumptions of knockoffs by conditioning. Ann. Stat. 48, 3021-3042 (2020).

Koyuncu, D. & Yener, B. Missing value knockoffs, arXiv preprint arXiv:2202.13054 (2022).

Wang, R., Dai, R. & Zheng, C. Controlling FDR in selecting group-level simultaneous signals from multiple data sources with application to the national covid collaborative cohort data, arXiv preprint arXiv:2303.01599 (2023).

All that Glitters Is not Gold: Type-I Error Controlled Variable Selection from Clinical Trial Data.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Manuela R Zimmermann (MR)

Mark Baillie (M)

Matthias Kormaksson (M)

David Ohlssen (D)

Konstantinos Sechidis (K)

Classifications MeSH