Impact of sample size on the stability of risk scores from clinical prediction models: a case study in cardiovascular disease.

Precision Risk prediction Sample size Stability Statistical methods

Journal

Diagnostic and prognostic research
ISSN: 2397-7523
Titre abrégé: Diagn Progn Res
Pays: England
ID NLM: 101718985

Informations de publication

Date de publication:
2020
Historique:
received: 25 02 2020
accepted: 12 08 2020
entrez: 18 9 2020
pubmed: 19 9 2020
medline: 19 9 2020
Statut: epublish

Résumé

Stability of risk estimates from prediction models may be highly dependent on the sample size of the dataset available for model derivation. In this paper, we evaluate the stability of cardiovascular disease risk scores for individual patients when using different sample sizes for model derivation; such sample sizes include those similar to models recommended in the national guidelines, and those based on recently published sample size formula for prediction models. We mimicked the process of sampling For a sample size of 100,000, the median 5-95th percentile range of risks for patients across the 1000 models was 0.77%, 1.60%, 2.42% and 3.22% for patients with population-derived risks of 4-5%, 9-10%, 14-15% and 19-20% respectively; for Widely used cardiovascular disease risk prediction models suffer from high levels of instability induced by sampling variation. Many models will also suffer from overfitting (a closely linked concept), but at acceptable levels of overfitting, there may still be high levels of instability in individual risk. Stability of risk estimates should be a criterion when determining the minimum sample size to develop models.

Sections du résumé

BACKGROUND BACKGROUND
Stability of risk estimates from prediction models may be highly dependent on the sample size of the dataset available for model derivation. In this paper, we evaluate the stability of cardiovascular disease risk scores for individual patients when using different sample sizes for model derivation; such sample sizes include those similar to models recommended in the national guidelines, and those based on recently published sample size formula for prediction models.
METHODS METHODS
We mimicked the process of sampling
RESULTS RESULTS
For a sample size of 100,000, the median 5-95th percentile range of risks for patients across the 1000 models was 0.77%, 1.60%, 2.42% and 3.22% for patients with population-derived risks of 4-5%, 9-10%, 14-15% and 19-20% respectively; for
CONCLUSIONS CONCLUSIONS
Widely used cardiovascular disease risk prediction models suffer from high levels of instability induced by sampling variation. Many models will also suffer from overfitting (a closely linked concept), but at acceptable levels of overfitting, there may still be high levels of instability in individual risk. Stability of risk estimates should be a criterion when determining the minimum sample size to develop models.

Identifiants

pubmed: 32944655
doi: 10.1186/s41512-020-00082-3
pii: 82
pmc: PMC7487849
doi:

Types de publication

Journal Article

Langues

eng

Pagination

14

Informations de copyright

© The Author(s) 2020.

Déclaration de conflit d'intérêts

Competing interestsAll authors state they have nothing to disclose.

Références

J Am Heart Assoc. 2018 Mar 10;7(6):
pubmed: 29525785
Circulation. 2008 Feb 12;117(6):743-53
pubmed: 18212285
BMC Med Inform Decis Mak. 2008 Nov 26;8:53
pubmed: 19036144
J Clin Hypertens (Greenwich). 2012 Apr;14(4):261-4
pubmed: 22458749
Stat Med. 2019 Mar 30;38(7):1276-1296
pubmed: 30357870
Eur Heart J. 2003 Jun;24(11):987-1003
pubmed: 12788299
BMJ. 2020 Mar 18;368:m441
pubmed: 32188600
Intensive Care Med. 1995 Sep;21(9):770-6
pubmed: 8847434
JAMA. 2012 Apr 18;307(15):1585-6
pubmed: 22511683
J Clin Epidemiol. 2005 Apr;58(4):383-90
pubmed: 15862724
BMJ. 2016 Jan 25;352:i6
pubmed: 26810254
Heart. 2007 Feb;93(2):172-6
pubmed: 17090561
BMJ. 2017 May 23;357:j2099
pubmed: 28536104
Stat Methods Med Res. 2019 Aug;28(8):2455-2474
pubmed: 29966490
Chest. 1991 Dec;100(6):1619-36
pubmed: 1959406
BMC Med Res Methodol. 2016 Nov 24;16(1):163
pubmed: 27881078
BMJ. 2016 May 16;353:i2416
pubmed: 27184143
Circulation. 2014 Jun 24;129(25 Suppl 2):S49-73
pubmed: 24222018
Int J Epidemiol. 2015 Jun;44(3):827-36
pubmed: 26050254
Stat Med. 1996 Feb 28;15(4):361-87
pubmed: 8668867
Ann Intern Med. 2013 Apr 16;158(8):596-603
pubmed: 23588748
BMJ. 2012 Sep 18;345:e5900
pubmed: 22990994

Auteurs

Alexander Pate (A)

Centre for Health Informatics, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, M13 9PL UK.

Richard Emsley (R)

Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crispigny Park, London, SE5 8AF UK.

Matthew Sperrin (M)

Centre for Health Informatics, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, M13 9PL UK.

Glen P Martin (GP)

Centre for Health Informatics, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, M13 9PL UK.

Tjeerd van Staa (T)

Centre for Health Informatics, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, M13 9PL UK.

Classifications MeSH