Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study.


Journal

Radiation oncology (London, England)
ISSN: 1748-717X
Titre abrégé: Radiat Oncol
Pays: England
ID NLM: 101265111

Informations de publication

Date de publication:
14 May 2020
Historique:
received: 19 11 2019
accepted: 22 04 2020
entrez: 16 5 2020
pubmed: 16 5 2020
medline: 23 3 2021
Statut: epublish

Résumé

Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. The model development process requires two independent discovery and validation data sets. Each of them may contain samples collected in a single center or a collection of samples from multiple centers. Multi-center data tend to be more heterogeneous than single-center data but are less affected by potential site-specific biases. Optimal use of limited data resources for discovery and validation with respect to the expected success of a study requires dispassionate, objective decision-making. In this work, we addressed the impact of the choice of single-center and multi-center data as discovery and validation data sets, and assessed how this impact depends on the three data characteristics signal strength, number of informative features and sample size. We set up a simulation study to quantify the predictive performance of a model trained and validated on different combinations of in silico single-center and multi-center data. The standard bioinformatical analysis workflow of batch correction, feature selection and parameter estimation was emulated. For the determination of model quality, four measures were used: false discovery rate, prediction error, chance of successful validation (significant correlation of predicted and true validation data outcome) and model calibration. In agreement with literature about generalizability of signatures, prognostic models fitted to multi-center data consistently outperformed their single-center counterparts when the prediction error was the quality criterion of interest. However, for low signal strengths and small sample sizes, single-center discovery sets showed superior performance with respect to false discovery rate and chance of successful validation. With regard to decision making, this simulation study underlines the importance of study aims being defined precisely a priori. Minimization of the prediction error requires multi-center discovery data, whereas single-center data are preferable with respect to false discovery rate and chance of successful validation when the expected signal or sample size is low. In contrast, the choice of validation data solely affects the quality of the estimator of the prediction error, which was more precise on multi-center validation data.

Sections du résumé

BACKGROUND BACKGROUND
Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. The model development process requires two independent discovery and validation data sets. Each of them may contain samples collected in a single center or a collection of samples from multiple centers. Multi-center data tend to be more heterogeneous than single-center data but are less affected by potential site-specific biases. Optimal use of limited data resources for discovery and validation with respect to the expected success of a study requires dispassionate, objective decision-making. In this work, we addressed the impact of the choice of single-center and multi-center data as discovery and validation data sets, and assessed how this impact depends on the three data characteristics signal strength, number of informative features and sample size.
METHODS METHODS
We set up a simulation study to quantify the predictive performance of a model trained and validated on different combinations of in silico single-center and multi-center data. The standard bioinformatical analysis workflow of batch correction, feature selection and parameter estimation was emulated. For the determination of model quality, four measures were used: false discovery rate, prediction error, chance of successful validation (significant correlation of predicted and true validation data outcome) and model calibration.
RESULTS RESULTS
In agreement with literature about generalizability of signatures, prognostic models fitted to multi-center data consistently outperformed their single-center counterparts when the prediction error was the quality criterion of interest. However, for low signal strengths and small sample sizes, single-center discovery sets showed superior performance with respect to false discovery rate and chance of successful validation.
CONCLUSIONS CONCLUSIONS
With regard to decision making, this simulation study underlines the importance of study aims being defined precisely a priori. Minimization of the prediction error requires multi-center discovery data, whereas single-center data are preferable with respect to false discovery rate and chance of successful validation when the expected signal or sample size is low. In contrast, the choice of validation data solely affects the quality of the estimator of the prediction error, which was more precise on multi-center validation data.

Identifiants

pubmed: 32410693
doi: 10.1186/s13014-020-01543-1
pii: 10.1186/s13014-020-01543-1
pmc: PMC7227093
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

109

Références

BMJ. 2012 Feb 14;344:e813
pubmed: 22334559
BMC Med Ethics. 2017 Mar 1;18(1):19
pubmed: 28249596
BMC Bioinformatics. 2016 Jan 12;17:27
pubmed: 26753519
Mol Oncol. 2019 Mar;13(3):535-542
pubmed: 30561127
Bioinformatics. 2017 Feb 1;33(3):397-404
pubmed: 27797760
Stat Med. 2000 Feb 29;19(4):453-73
pubmed: 10694730
Radiat Oncol. 2013 Dec 28;8:296
pubmed: 24373621
Stat Methods Med Res. 2018 Jun;27(6):1723-1736
pubmed: 27647815
Radiat Oncol. 2011 Nov 18;6:161
pubmed: 22099067
PLoS Genet. 2007 Sep;3(9):1724-35
pubmed: 17907809
PeerJ. 2018 Jan 2;6:e4204
pubmed: 29441228
Radiat Oncol. 2014 Jun 03;9:128
pubmed: 24893775
Breast Cancer (Dove Med Press). 2017 May 29;9:393-400
pubmed: 28615971
Nature. 2015 Jan 29;517(7536):576-82
pubmed: 25631445
J Stat Softw. 2010;33(1):1-22
pubmed: 20808728
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Radiat Oncol. 2018 Oct 1;13(1):193
pubmed: 30285791
Proc Natl Acad Sci U S A. 2005 Mar 8;102(10):3738-43
pubmed: 15701700
BMC Med Res Methodol. 2013 Mar 06;13:33
pubmed: 23496923
Clin Cancer Res. 2012 Sep 15;18(18):5134-43
pubmed: 22832933
BMJ. 2016 Jun 22;353:i3140
pubmed: 27334381
Radiat Oncol. 2011 Nov 10;6:153
pubmed: 22074483
BMJ. 2001 Jul 7;323(7303):42-6
pubmed: 11440947
Nat Rev Cancer. 2018 May;18(5):269-282
pubmed: 29497144
Proc Natl Acad Sci U S A. 2013 Apr 30;110(18):7413-7
pubmed: 23589849
N Engl J Med. 2016 Aug 25;375(8):717-29
pubmed: 27557300
PLoS One. 2012;7(12):e50938
pubmed: 23236413
Radiat Oncol. 2017 Jul 17;12(1):120
pubmed: 28716107
Oncotarget. 2016 Jul 19;7(29):45764-45775
pubmed: 27302927
Lancet Oncol. 2009 May;10(5):459-66
pubmed: 19269895
Onco Targets Ther. 2018 Jun 12;11:3415-3424
pubmed: 29928133
Lancet. 2005 Feb 5-11;365(9458):488-92
pubmed: 15705458
Brief Bioinform. 2013 Jul;14(4):469-90
pubmed: 22851511
Clin Cancer Res. 2018 Mar 15;24(6):1364-1374
pubmed: 29298797
Med Phys. 2018 Nov;45(11):e1111-e1122
pubmed: 30421807
Ann Intern Med. 2015 Jan 6;162(1):W1-73
pubmed: 25560730
J Clin Oncol. 2009 Mar 10;27(8):1160-7
pubmed: 19204204
Brief Bioinform. 2019 Nov 21;:
pubmed: 31750518
Stat Med. 2019 Aug 15;38(18):3444-3459
pubmed: 31148207
Br J Cancer. 2018 Aug;119(4):389-407
pubmed: 30061587
Clin Cancer Res. 2019 Mar 1;25(5):1505-1516
pubmed: 30171046
Radiat Oncol. 2014 Jan 11;9:21
pubmed: 24411063
Radiat Oncol. 2018 Jul 3;13(1):123
pubmed: 29970111
Proc Natl Acad Sci U S A. 2001 Sep 11;98(19):10869-74
pubmed: 11553815
Radiat Environ Biophys. 2014 Mar;53(1):1-29
pubmed: 24141602
Stat Methods Med Res. 2019 Apr;28(4):969-985
pubmed: 29157119
Cancer Manag Res. 2018 Dec 20;11:131-142
pubmed: 30588115
Epidemiology. 2010 Jan;21(1):128-38
pubmed: 20010215
Int J Cancer. 2018 Sep 15;143(6):1505-1515
pubmed: 29663366
Nat Rev Genet. 2010 Oct;11(10):733-9
pubmed: 20838408

Auteurs

Daniel Samaga (D)

Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany. daniel.samaga@helmholtz-muenchen.de.

Roman Hornung (R)

Department of Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, Munich, 81377, Germany.

Herbert Braselmann (H)

Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany.

Julia Hess (J)

Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany.
Clinical Cooperation Group Personalized Radiotherapy in Head and Neck Cancer, Helmholtz Zentrum München, Research Center for Environmental Health (GmbH), Munich, Ingolstädter Landstr. 1, Munich, 85764, Germany.
Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany.

Horst Zitzelsberger (H)

Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany.
Clinical Cooperation Group Personalized Radiotherapy in Head and Neck Cancer, Helmholtz Zentrum München, Research Center for Environmental Health (GmbH), Munich, Ingolstädter Landstr. 1, Munich, 85764, Germany.
Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany.

Claus Belka (C)

Clinical Cooperation Group Personalized Radiotherapy in Head and Neck Cancer, Helmholtz Zentrum München, Research Center for Environmental Health (GmbH), Munich, Ingolstädter Landstr. 1, Munich, 85764, Germany.
Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany.

Anne-Laure Boulesteix (AL)

Department of Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, Munich, 81377, Germany.

Kristian Unger (K)

Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany.
Clinical Cooperation Group Personalized Radiotherapy in Head and Neck Cancer, Helmholtz Zentrum München, Research Center for Environmental Health (GmbH), Munich, Ingolstädter Landstr. 1, Munich, 85764, Germany.
Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH