Tree-Weighting for Multi-Study Ensemble Learners.


Journal

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
ISSN: 2335-6936
Titre abrégé: Pac Symp Biocomput
Pays: United States
ID NLM: 9711271

Informations de publication

Date de publication:
2020
Historique:
entrez: 5 12 2019
pubmed: 5 12 2019
medline: 19 3 2021
Statut: ppublish

Résumé

Multi-study learning uses multiple training studies, separately trains classifiers on each, and forms an ensemble with weights rewarding members with better cross-study prediction ability. This article considers novel weighting approaches for constructing tree-based ensemble learners in this setting. Using Random Forests as a single-study learner, we compare weighting each forest to form the ensemble, to extracting the individual trees trained by each Random Forest and weighting them directly. We find that incorporating multiple layers of ensembling in the training process by weighting trees increases the robustness of the resulting predictor. Furthermore, we explore how ensembling weights correspond to tree structure, to shed light on the features that determine whether weighting trees directly is advantageous. Finally, we apply our approach to genomic datasets and show that weighting trees improves upon the basic multi-study learning paradigm. Code and supplementary material are available at https://github.com/m-ramchandran/tree-weighting.

Identifiants

pubmed: 31797618
pii: 9789811215636_0040
pmc: PMC6980320
mid: NIHMS1061174

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

451-462

Subventions

Organisme : NCI NIH HHS
ID : P30 CA006516
Pays : United States
Organisme : NCI NIH HHS
ID : T32 CA009337
Pays : United States

Références

J Stat Softw. 2010;33(1):1-22
pubmed: 20808728
Bioinformatics. 2014 Jun 15;30(12):i105-12
pubmed: 24931973
Cancer Inform. 2016 Mar 31;14(Suppl 5):57-73
pubmed: 27081304
Stat Anal Data Min. 2013 Dec 1;6(6):496-505
pubmed: 24501613
Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2578-2583
pubmed: 29531060
Database (Oxford). 2013 Apr 02;2013:bat013
pubmed: 23550061
Proc Natl Acad Sci U S A. 2018 Feb 20;115(8):1943-1948
pubmed: 29351989
Biostatistics. 2018 Sep 6;:null
pubmed: 30202918
Genome Med. 2016 Mar 09;8(1):27
pubmed: 26961683

Auteurs

Maya Ramchandran (M)

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA, maya_ramchandran@g.harvard.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH