Individual dynamic prediction of clinical endpoint from large dimensional longitudinal biomarker history: a landmark approach.
Individual prediction
Landmark
Longitudinal data
Machine learning methods
Survival data
Journal
BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545
Informations de publication
Date de publication:
11 07 2022
11 07 2022
Historique:
received:
09
02
2021
accepted:
15
06
2022
entrez:
11
7
2022
pubmed:
12
7
2022
medline:
14
7
2022
Statut:
epublish
Résumé
The individual data collected throughout patient follow-up constitute crucial information for assessing the risk of a clinical event, and eventually for adapting a therapeutic strategy. Joint models and landmark models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the patient history includes much more repeated markers. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of markers. We combined a landmark approach extended to endogenous markers history with machine learning methods adapted to survival data. Each marker trajectory is modeled using the information collected up to the landmark time, and summary variables that best capture the individual trajectories are derived. These summaries and additional covariates are then included in different prediction methods adapted to survival data, namely regularized regressions and random survival forests, to predict the event from the landmark time. We also show how predictive tools can be combined into a superlearner. The performances are evaluated by cross-validation using estimators of Brier Score and the area under the Receiver Operating Characteristic curve adapted to censored data. We demonstrate in a simulation study the benefits of machine learning survival methods over standard survival models, especially in the case of numerous and/or nonlinear relationships between the predictors and the event. We then applied the methodology in two prediction contexts: a clinical context with the prediction of death in primary biliary cholangitis, and a public health context with age-specific prediction of death in the general elderly population. Our methodology, implemented in R, enables the prediction of an event using the entire longitudinal patient history, even when the number of repeated markers is large. Although introduced with mixed models for the repeated markers and methods for a single right censored time-to-event, the technique can be used with any other appropriate modeling technique for the markers and can be easily extended to competing risks setting.
Sections du résumé
BACKGROUND
The individual data collected throughout patient follow-up constitute crucial information for assessing the risk of a clinical event, and eventually for adapting a therapeutic strategy. Joint models and landmark models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the patient history includes much more repeated markers. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of markers.
METHODS
We combined a landmark approach extended to endogenous markers history with machine learning methods adapted to survival data. Each marker trajectory is modeled using the information collected up to the landmark time, and summary variables that best capture the individual trajectories are derived. These summaries and additional covariates are then included in different prediction methods adapted to survival data, namely regularized regressions and random survival forests, to predict the event from the landmark time. We also show how predictive tools can be combined into a superlearner. The performances are evaluated by cross-validation using estimators of Brier Score and the area under the Receiver Operating Characteristic curve adapted to censored data.
RESULTS
We demonstrate in a simulation study the benefits of machine learning survival methods over standard survival models, especially in the case of numerous and/or nonlinear relationships between the predictors and the event. We then applied the methodology in two prediction contexts: a clinical context with the prediction of death in primary biliary cholangitis, and a public health context with age-specific prediction of death in the general elderly population.
CONCLUSIONS
Our methodology, implemented in R, enables the prediction of an event using the entire longitudinal patient history, even when the number of repeated markers is large. Although introduced with mixed models for the repeated markers and methods for a single right censored time-to-event, the technique can be used with any other appropriate modeling technique for the markers and can be easily extended to competing risks setting.
Identifiants
pubmed: 35818025
doi: 10.1186/s12874-022-01660-3
pii: 10.1186/s12874-022-01660-3
pmc: PMC9275051
doi:
Substances chimiques
Biomarkers
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
188Informations de copyright
© 2022. The Author(s).
Références
Int J Biostat. 2020 Feb 22;:
pubmed: 32097120
Am J Epidemiol. 2001 Nov 1;154(9):854-64
pubmed: 11682368
Neuroimage Clin. 2014 Aug 28;6:115-25
pubmed: 25379423
Stat Med. 2013 Dec 30;32(30):5381-97
pubmed: 24027076
Eur Heart J. 2017 Jun 14;38(23):1805-1814
pubmed: 27436868
J R Stat Soc Series B Stat Methodol. 2010 Jan;72(1):3-25
pubmed: 20107611
Biometrics. 1982 Dec;38(4):963-74
pubmed: 7168798
Stat Appl Genet Mol Biol. 2007;6:Article25
pubmed: 17910531
Biostatistics. 2014 Oct;15(4):757-73
pubmed: 24728979
Biom J. 2017 Nov;59(6):1277-1300
pubmed: 28508545
Biom J. 2010 Feb;52(1):70-84
pubmed: 19937997
Biometrics. 2010 Sep;66(3):983-7; discussion 987-91
pubmed: 20849547
Biometrics. 2008 Dec;64(4):1238-46
pubmed: 18261160
J Stat Softw. 2011 Mar;39(5):1-13
pubmed: 27065756
Stat Med. 2020 Nov 20;39(26):3685-3699
pubmed: 32717100
Am J Epidemiol. 2001 Oct 1;154(7):642-8
pubmed: 11581098
Bioinformatics. 2015 Feb 1;31(3):397-404
pubmed: 25286920
Stat Methods Med Res. 2019 Dec;28(12):3649-3666
pubmed: 30463497
Stat Med. 2017 Dec 10;36(28):4514-4528
pubmed: 27730661
BMC Med Res Methodol. 2019 Mar 6;19(1):46
pubmed: 30841848
Biometrics. 2013 Mar;69(1):206-13
pubmed: 23379600
Am J Epidemiol. 2018 Jul 1;187(7):1530-1538
pubmed: 29584812
J Stat Softw. 2012 Sep;50(11):1-23
pubmed: 25317082
Hepatology. 1994 Jul;20(1 Pt 1):126-34
pubmed: 8020881
Biostatistics. 2009 Jul;10(3):535-49
pubmed: 19369642
Biometrics. 2015 Mar;71(1):102-113
pubmed: 25311240
Biometrics. 2011 Sep;67(3):819-29
pubmed: 21306352
BMC Med Res Methodol. 2018 Feb 26;18(1):24
pubmed: 29482517
Biometrics. 2017 Mar;73(1):83-93
pubmed: 27438160
N Engl J Med. 1996 Nov 21;335(21):1570-80
pubmed: 8900092