Using knockoffs for controlled predictive biomarker identification.

false discovery rate heterogeneous treatment effect knockoff filter predictive biomarker identification

Journal

Statistics in medicine
ISSN: 1097-0258
Titre abrégé: Stat Med
Pays: England
ID NLM: 8215016

Informations de publication

Date de publication:
10 11 2021
Historique:
revised: 18 03 2021
received: 21 12 2020
accepted: 22 06 2021
pubmed: 31 7 2021
medline: 30 10 2021
entrez: 30 7 2021
Statut: ppublish

Résumé

One of the key challenges of personalized medicine is to identify which patients will respond positively to a given treatment. The area of subgroup identification focuses on this challenge, that is, identifying groups of patients that experience desirable characteristics, such as an enhanced treatment effect. A crucial first step towards the subgroup identification is to identify the baseline variables (eg, biomarkers) that influence the treatment effect, which are known as predictive variables. Many subgroup discovery algorithms return importance scores that capture the variables' predictive strength. However, a major limitation of these scores is that they do not answer the core question: "Which variables are actually predictive?" With our work we answer this question by using the knockoff framework, which is a general framework for controlling the false discovery rate when performing prognostic variable selection. In contrast, our work is the first that uses knockoffs for predictive variable selection. We introduce two novel knockoff filters: one parametric, building on variable importance scores derived from a penalized linear regression model, and one non-parametric, building on causal forest variable importance scores. We conduct extensive simulations to validate performance of the proposed methodology and we also apply the proposed methods to data from a randomized clinical trial.

Identifiants

pubmed: 34328655
doi: 10.1002/sim.9134
doi:

Substances chimiques

Biomarkers 0

Types de publication

Journal Article Randomized Controlled Trial

Langues

eng

Sous-ensembles de citation

IM

Pagination

5453-5473

Informations de copyright

© 2021 John Wiley & Sons Ltd.

Références

Blay JY, Lacombe D, Meunier F, Stupp R. Personalised medicine in oncology: questions for the next 20 years. Lancet Oncol. 2012;13(5):448-449.
Sparano JA, Gray RJ, Makower DF, et al. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. N Engl J Med. 2018;379(2):111-121.
Lipkovich I, Dmitrienko A. Biomarker identification in clinical trials. Clinical and Statistical Considerations in Personalized Medicine. Boca Raton, FL: CRC Press; 2014:211-264.
Lipkovich I, Dmitrienko A, D'Agostino B Sr R. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Stat Med. 2017;36(1):136-196.
Dmitrienko A, Millen B, Lipkovich I. Multiplicity considerations in subgroup analysis. Stat Med. 2017;36(28):4446-4454.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soci Ser B (Methodol). 1995;57(1):289-300.
Watts DD, Habiger JD. A multiple testing protocol for exploratory data analysis and the local misclassification rate. Commun Stat Theory Methods. 2018;47(15):3588-3604.
Candès E, Fan Y, Janson L, Lv J. Panning for gold: 'model-X' knockoffs for high dimensional controlled variable selection. J Royal Stat Soc Ser B Stat Methodol. 2018;80(3):551-577.
Barber RF, Candés EJ. Controlling the false discovery rate via knockoffs. Ann Stat. 2015;43(5):2055-2085.
Jiang T, Li Y, Motsinger-Reif AA. Knockoff boosted tree for model-free variable selection. Bioinformatics 2020. btaa770 10.1093/bioinformatics/btaa770.
Su X, Tsai CL, Wang H, Nickerson DM, Li B. Subgroup analysis via recursive partitioning. J Mach Learn Res. 2009;10(2).
Lipkovich I, Dmitrienko A, Denne J, Enas G. Subgroup identification based on differential effect search-a recursive partitioning method for establishing response to treatment in patient subpopulations. Stat Med. 2011;30(21):2601-2621.
Tian L, Alizadeh AA, Gentles AJ, Tibshirani R. A simple method for estimating interactions between a treatment and a large number of covariates. J Am Stat Assoc. 2014;109(508):1517-1532.
Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci. 2016;113(27):7353-7360.
Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Elsevier; 2014.
Janson L, Su W. Familywise error rate control via knockoffs. Electron J Stat. 2016;10(1):960-975.
Sesia M, Sabatti C, Candès EJ. Gene hunting with hidden Markov model knockoffs. Biometrika. 2018;106(1):1-18.
Romano Y, Sesia M, Candès E. Deep knockoffs. J Am Stat Assoc. 2019;115(532):1861-1872.
Jordon J, Yoon J, Van Der Schaar M. KnockoffGAN: Generating knockoffs for feature selection using generative adversarial networks. Paper presented at: Proceedings of the 7th International Conference on Learning Representations, ICLR; Vol. 2019, 2019:1-25.
Bates S, Candès E, Janson L, Wang W. Metropolized knockoff sampling. J Am Stat Assoc. 2020;1-25.
Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B (Methodol). 1996;58(1):267-288.
Breiman L. Random forests. Mach Learn. 2001;45(1):5-32.
Kormaksson M, Kelly LJ, Zhu X, Haemmerle S, Pricop L, Ohlssen D. Sequential knockoffs for continuous and categorical predictors: With application to a large psoriatic arthritis clinical trial pool. Stat Med. 2021;40(14):3313-3328. https://doi.org/10.1002/sim.8955.
Rubin DB. Estimating causal effects of treatment in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688-701.
Rubin DB. Essential concepts of causal inference: a remarkable history and an intriguing future. Biostat Epidemiol. 2019;3(1):140-155.
Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat. 2004;86(1):4-29.
Imbens GW, Rubin DB. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge, MA: Cambridge University Press; 2015.
Seibold H, Zeileis A, Hothorn T. Model-based recursive partitioning for subgroup analyses. Int J Biostat. 2016;12(1):45-63.
Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc. 2018;113(523):1228-1242.
Chen JE, Hsiang CW. Causal random forests model using instrumental variable quantile regression. Econometrics. 2019;7(4):1-22.
Tibshirani J, Athey S, Friedberg R, et al. grf: Generalized Random Forests. R package version 0.10.3; 2019.
Xie Y, Chen N, Shi X. False discovery rate controlled heterogeneous treatment effect detection for online controlled experiments; 2018:876-885.
Powers S, Qian J, Jung K, et al. Some methods for heterogeneous treatment effect estimation in high dimensions. Stat Med. 2018;37(11):1767-1787.
Athey S, Imbens GW. Machine learning methods for estimating heterogeneous causal effects stat. Stat. 2015;1050(5):1-26.
Foster JC, Taylor JM, Ruberg SJ. Subgroup identification from randomized clinical trial data. Stat Med. 2011;30(24):2867-2880.
McInnes IB, Mease PJ, Kirkham B, et al. Secukinumab, a human anti-interleukin-17A monoclonal antibody, in patients with psoriatic arthritis (FUTURE 2): a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet. 2015;386(9999):1137-1146.
Felson DT, Anderson JJ, Boers M, et al. The American College of Rheumatology preliminary core set of disease activity measures for rheumatoid arthritis clinical trials. Arthritis Rheumatism Official J Am College Rheumatol. 1993;36(6):729-740.
Nash P, Mease PJ, McInnes IB, et al. Efficacy and safety of secukinumab administration by autoinjector in patients with psoriatic arthritis: results from a randomized, placebo-controlled trial (FUTURE 3). Arthritis Res Ther. 2018;20(1):47.
Kivitz AJ, Nash P, Tahir H, et al. Efficacy and safety of subcutaneous secukinumab 150 mg with or without loading regimen in psoriatic arthritis: results from the FUTURE 4 study. Rheumatol Therapy. 2019;6(3):393-407.
Mease P, van der Heijde D, Landewé R, et al. Secukinumab improves active psoriatic arthritis symptoms and inhibits radiographic progression: primary results from the randomised, double-blind, phase III FUTURE 5 study. Ann Rheum Dis. 2018;77(6):890-897.
Ballarini NM, Chiu YD, König F, Posch M, Jaki T. A critical review of graphics for subgroup analyses in clinical trials. Pharm Stat. 2020.
DiCiccio TJ, Efron B. Bootstrap confidence intervals. Journal of the Royal Statistical Society: Series B (Methodological). 1996;50(3):189-212.
Gratacòs J, Casado E, Real J, Torre-Alonso JC. Prediction of major clinical response (ACR50) to infliximab in psoriatic arthritis refractory to methotrexate. Ann Rheum Dis. 2007;66(4):493-497.
Pouw J, Leijten E, Radstake T, Boes M. Emerging molecular biomarkers for predicting therapy response in psoriatic arthritis: A review of literature. Clin Immunol. 2020;211:108318.
Webster K, Cella D, Yost K. The functional assessment of chronic illness therapy (FACIT) measurement system: properties, applications, and interpretation. Health Qual Life Outcomes. 2003;1(1):1-7.
Højgaard P, Ballegaard C, Cordtz R, et al. Gender differences in biologic treatment outcomes-a study of 1750 patients with psoriatic arthritis using Danish health care registers. Rheumatology. 2018;57(9):1651-1660.
Nas K, Capkin E, Dagli AZ, et al. Gender specific differences in patients with psoriatic arthritis. Mod Rheumatol. 2017;27(2):345-349.
Zhang W, Le TD, Liu L, Zhou ZH, Li J. Mining heterogeneous causal effects for personalized cancer treatment. Bioinformatics. 2017;33(15):2372-2378.
Lopez MJ, Gutman R. Estimation of causal effects with multiple treatments: a review and new ideas. Stat Sci. 2017;32(3):432-454.
Lipkovich I, Dmitrienko A. Strategies for identifying predictive biomarkers and subgroups with enhanced treatment effect in clinical trials using SIDES. J Biopharm Stat. 2014;24(1):130-153.
Krzykalla J, Benner A, Kopp-Schneider A. Exploratory identification of predictive biomarkers in randomized trials with normal endpoints. Stat Med. 2020;39(7):923-939.
Loh WY, He X, Man M. A regression tree approach to identifying subgroups with differential treatment effects. Stat Med. 2015;34(11):1818-1833.
Sechidis K, Papangelou K, Metcalfe PD, Svensson D, Weatherall J, Brown G. Distinguishing prognostic and predictive biomarkers: an information theoretic approach. Bioinformatics. 2018;34(19):3365-3376.
Je C, Hsiang CW. Causal random forests model using instrumental variable quantile regression. Dent Econ. 2019;7(4):49.
Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007;8(1):25.
O'Neill E, Weeks M. Causal tree estimation of heterogeneous household response to time-of-use electricity pricing schemes. technical report, Cambridge Working Papers in Economics CWPE18653, University of Cambridge; 2018
Lim M, Hastie T. Learning interactions via hierarchical group-lasso regularization. J Comput Graph Stat. 2015;24(3):627-654.
Lim M, Hastie T. glinternet: Learning Interactions via Hierarchical Group-Lasso Regularization. R package version 1.0.10; 2019.
Jiang W, Yu W. Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies. Bioinformatics. 2017;33(4):500-507.
Brinster R, Köttgen A, Tayo BO, et al. Control procedures and estimators of the false discovery rate and their application in low-dimensional settings: an empirical investigation. BMC Bioinform. 2018;19(1):78.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1-22.
Patterson E, Sesia M. knockoff: the knockoff filter for controlled variable selection. R package version 0.3.2; 2018.

Auteurs

Konstantinos Sechidis (K)

Advanced Methodology and Data Science, Novartis Pharma AG, Basel, Switzerland.

Matthias Kormaksson (M)

Advanced Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA.

David Ohlssen (D)

Advanced Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH