Surrogate minimal depth as an importance measure for variables in random forests.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 10 2019
01 10 2019
Historique:
received:
24
04
2018
revised:
07
01
2019
accepted:
26
02
2019
pubmed:
3
3
2019
medline:
11
6
2020
entrez:
3
3
2019
Statut:
ppublish
Résumé
It has been shown that the machine learning approach random forest can be successfully applied to omics data, such as gene expression data, for classification or regression and to select variables that are important for prediction. However, the complex relationships between predictor variables, in particular between causal predictor variables, make the interpretation of currently applied variable selection techniques difficult. Here we propose a new variable selection approach called surrogate minimal depth (SMD) that incorporates surrogate variables into the concept of minimal depth (MD) variable importance. Applying SMD, we show that simulated correlation patterns can be reconstructed and that the increased consideration of variable relationships improves variable selection. When compared with existing state-of-the-art methods and MD, SMD has higher empirical power to identify causal variables while the resulting variable lists are equally stable. In conclusion, SMD is a promising approach to get more insight into the complex interplay of predictor variables and outcome in a high-dimensional data setting. https://github.com/StephanSeifert/SurrogateMinimalDepth. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 30824905
pii: 5368013
doi: 10.1093/bioinformatics/btz149
pmc: PMC6761946
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
3663-3671Informations de copyright
© The Author(s) 2019. Published by Oxford University Press.
Références
Bioinformatics. 2018 Nov 1;34(21):3711-3718
pubmed: 29757357
Comput Biol Chem. 2010 Aug;34(4):215-25
pubmed: 20702140
BMC Bioinformatics. 2010 Feb 27;11:110
pubmed: 20187966
Nature. 2012 Oct 4;490(7418):61-70
pubmed: 23000897
Brief Bioinform. 2019 Mar 22;20(2):492-503
pubmed: 29045534
BMC Bioinformatics. 2008 Jul 11;9:307
pubmed: 18620558
Philos Trans A Math Phys Eng Sci. 2009 Nov 13;367(1906):4237-53
pubmed: 19805443
Psychol Methods. 2009 Dec;14(4):323-48
pubmed: 19968396
BMC Bioinformatics. 2007 Jan 25;8:25
pubmed: 17254353
BMC Bioinformatics. 2008 Dec 29;9:559
pubmed: 19114008
Horm Mol Biol Clin Investig. 2012 Dec;12(1):377-90
pubmed: 25436697
Expert Rev Mol Diagn. 2016 Jul;16(7):719-22
pubmed: 26959799