Block Forests: random forests for blocks of clinical and omics covariate data.

Genomics Humans Machine Learning Survival Analysis

Cancer Machine learning Multi-omics data Prediction Random forest Statistics Survival analysis

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
27 Jun 2019

Historique:

received: 15 01 2019

accepted: 07 06 2019

entrez: 29 6 2019

pubmed: 30 6 2019

medline: 16 8 2019

Statut: epublish

Résumé

In the last years more and more multi-omics data are becoming available, that is, data featuring measurements of several types of omics data for each patient. Using multi-omics data as covariate data in outcome prediction is both promising and challenging due to the complex structure of such data. Random forest is a prediction method known for its ability to render complex dependency patterns between the outcome and the covariates. Against this background we developed five candidate random forest variants tailored to multi-omics covariate data. These variants modify the split point selection of random forest to incorporate the block structure of multi-omics data and can be applied to any outcome type for which a random forest variant exists, such as categorical, continuous and survival outcomes. Using 20 publicly available multi-omics data sets with survival outcome we compared the prediction performances of the block forest variants with alternatives. We also considered the common special case of having clinical covariates and measurements of a single omics data type available. We identify one variant termed "block forest" that outperformed all other approaches in the comparison study. In particular, it performed significantly better than standard random survival forest (adjusted p-value: 0.027). The two best performing variants have in common that the block choice is randomized in the split point selection procedure. In the case of having clinical covariates and a single omics data type available, the improvements of the variants over random survival forest were larger than in the case of the multi-omics data. The degrees of improvements over random survival forest varied strongly across data sets. Moreover, considering all clinical covariates mandatorily improved the performance. This result should however be interpreted with caution, because the level of predictive information contained in clinical covariates depends on the specific application. The new prediction method block forest for multi-omics data can significantly improve the prediction performance of random forest and outperformed alternatives in the comparison. Block forest is particularly effective for the special case of using clinical covariates in combination with measurements of a single omics data type.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

We identify one variant termed "block forest" that outperformed all other approaches in the comparison study. In particular, it performed significantly better than standard random survival forest (adjusted p-value: 0.027). The two best performing variants have in common that the block choice is randomized in the split point selection procedure. In the case of having clinical covariates and a single omics data type available, the improvements of the variants over random survival forest were larger than in the case of the multi-omics data. The degrees of improvements over random survival forest varied strongly across data sets. Moreover, considering all clinical covariates mandatorily improved the performance. This result should however be interpreted with caution, because the level of predictive information contained in clinical covariates depends on the specific application.

CONCLUSIONS CONCLUSIONS

The new prediction method block forest for multi-omics data can significantly improve the prediction performance of random forest and outperformed alternatives in the comparison. Block forest is particularly effective for the special case of using clinical covariates in combination with measurements of a single omics data type.

Identifiants

DOI: 10.1186/s12859-019-2942-y PMID: 31248362 PMC: PMC6598279

pubmed: 31248362

doi: 10.1186/s12859-019-2942-y

pii: 10.1186/s12859-019-2942-y

pmc: PMC6598279

doi:

Types de publication

Journal Article

Langues

eng

Pagination

358

Subventions

Organisme : Deutsche Forschungsgemeinschaft

ID : BO3139/6-1

Organisme : Deutsche Forschungsgemeinschaft

ID : BO3139/4-3

Organisme : Deutsche Forschungsgemeinschaft

ID : HO6422/1-2

Références

Bioinformatics. 2004 Nov 1;20(16):2626-35

pubmed: 15130933

BMC Bioinformatics. 2007 Jan 25;8:25

pubmed: 17254353

Bioinformatics. 2010 Jan 1;26(1):68-76

pubmed: 19846436

Brief Bioinform. 2011 May;12(3):215-29

pubmed: 21245078

PLoS One. 2011;6(11):e24709

pubmed: 22073136

N Engl J Med. 2013 May 30;368(22):2059-74

pubmed: 23634996

Comput Methods Programs Biomed. 2013 Sep;111(3):592-601

pubmed: 23849930

Bioinformatics. 2014 Mar 15;30(6):838-45

pubmed: 24162466

Brief Bioinform. 2015 Mar;16(2):291-303

pubmed: 24632304

Stat Med. 2014 Dec 30;33(30):5310-29

pubmed: 25042390

Bioinformatics. 2014 Nov 15;30(22):3152-8

pubmed: 25086004

BMC Med Res Methodol. 2015 Nov 04;15:95

pubmed: 26537575

Comput Math Methods Med. 2015;2015:576413

pubmed: 26858773

Genetics. 2016 Jul;203(3):1425-38

pubmed: 27129736

Bioinformatics. 2016 Sep 1;32(17):i413-i420

pubmed: 27587657

Comput Math Methods Med. 2017;2017:7691937

pubmed: 28546826

Front Genet. 2017 Jun 16;8:84

pubmed: 28670325

BMC Bioinformatics. 2018 Sep 12;19(1):322

pubmed: 30208855

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):415

pubmed: 30453872

High Throughput. 2019 Jan 18;8(1):null

pubmed: 30669303

Stat Appl Genet Mol Biol. 2019 Jan 26;18(1):null

pubmed: 30685747

Front Genet. 2019 Mar 08;10:166

pubmed: 30906311

Stat Med. 1997 Feb 28;16(4):385-95

pubmed: 9044528

Block Forests: random forests for blocks of clinical and omics covariate data.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Pagination

Subventions

Références

Auteurs

Roman Hornung (R)

Marvin N Wright (MN)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH