Accelerated and Interpretable Oblique Random Survival Forests.

Computational efficiency Supervised learning Variable importance

Journal

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
ISSN: 1061-8600
Titre abrégé: J Comput Graph Stat
Pays: United States
ID NLM: 101470926

Informations de publication

Date de publication:
2024
Historique:
medline: 26 8 2024
pubmed: 26 8 2024
entrez: 26 8 2024
Statut: ppublish

Résumé

The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles have high prediction accuracy, but assessing many linear combinations of predictors induces high computational overhead. In addition, few methods have been developed for estimation of variable importance (VI) with oblique RSFs. We introduce a method to increase computational efficiency of the oblique RSF and a method to estimate VI with the oblique RSF. Our computational approach uses Newton-Raphson scoring in each non-leaf node, We estimate VI by negating each coefficient used for a given predictor in linear combinations, and then computing the reduction in out-of-bag accuracy. In benchmarking experiments, we find our implementation of the oblique RSF is hundreds of times faster, with equivalent prediction accuracy, compared to existing software for oblique RSFs. We find in simulation studies that "negation VI" discriminates between relevant and irrelevant numeric predictors more accurately than permutation VI, Shapley VI, and a technique to measure VI using analysis of variance. All oblique RSF methods in the current study are available in the aorsf R package, and additional supplemental materials are available online.

Identifiants

pubmed: 39184344
doi: 10.1080/10618600.2023.2231048
pmc: PMC11343578
doi:

Types de publication

Journal Article

Langues

eng

Pagination

192-207

Déclaration de conflit d'intérêts

Disclosure Statement No potential conflict of interest was reported by the authors.

Auteurs

Byron C Jaeger (BC)

Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC.

Sawyer Welden (S)

Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC.

Kristin Lenoir (K)

Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC.

Jaime L Speiser (JL)

Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC.

Matthew W Segar (MW)

Department of Cardiology, Texas Heart Institute, Houston, TX.

Ambarish Pandey (A)

Division of Cardiology, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX.

Nicholas M Pajewski (NM)

Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC.

Classifications MeSH