Multicenter external validation of prediction models for clinical outcomes after spinal fusion for lumbar degenerative disease.
External validation
Lumbar fusion
Outcome prediction
Patient-reported outcome
Predictive analytics
Journal
European spine journal : official publication of the European Spine Society, the European Spinal Deformity Society, and the European Section of the Cervical Spine Research Society
ISSN: 1432-0932
Titre abrégé: Eur Spine J
Pays: Germany
ID NLM: 9301980
Informations de publication
Date de publication:
11 Jul 2024
11 Jul 2024
Historique:
received:
22
04
2024
accepted:
30
06
2024
revised:
18
06
2024
medline:
11
7
2024
pubmed:
11
7
2024
entrez:
10
7
2024
Statut:
aheadofprint
Résumé
Clinical prediction models (CPM), such as the SCOAP-CERTAIN tool, can be utilized to enhance decision-making for lumbar spinal fusion surgery by providing quantitative estimates of outcomes, aiding surgeons in assessing potential benefits and risks for each individual patient. External validation is crucial in CPM to assess generalizability beyond the initial dataset. This ensures performance in diverse populations, reliability and real-world applicability of the results. Therefore, we externally validated the tool for predictability of improvement in oswestry disability index (ODI), back and leg pain (BP, LP). Prospective and retrospective data from multicenter registry was obtained. As outcome measure minimum clinically important change was chosen for ODI with ≥ 15-point and ≥ 2-point reduction for numeric rating scales (NRS) for BP and LP 12 months after lumbar fusion for degenerative disease. We externally validate this tool by calculating discrimination and calibration metrics such as intercept, slope, Brier Score, expected/observed ratio, Hosmer-Lemeshow (HL), AUC, sensitivity and specificity. We included 1115 patients, average age 60.8 ± 12.5 years. For 12-month ODI, area-under-the-curve (AUC) was 0.70, the calibration intercept and slope were 1.01 and 0.84, respectively. For NRS BP, AUC was 0.72, with calibration intercept of 0.97 and slope of 0.87. For NRS LP, AUC was 0.70, with calibration intercept of 0.04 and slope of 0.72. Sensitivity ranged from 0.63 to 0.96, while specificity ranged from 0.15 to 0.68. Lack of fit was found for all three models based on HL testing. Utilizing data from a multinational registry, we externally validate the SCOAP-CERTAIN prediction tool. The model demonstrated fair discrimination and calibration of predicted probabilities, necessitating caution in applying it in clinical practice. We suggest that future CPMs focus on predicting longer-term prognosis for this patient population, emphasizing the significance of robust calibration and thorough reporting.
Sections du résumé
BACKGROUND
BACKGROUND
Clinical prediction models (CPM), such as the SCOAP-CERTAIN tool, can be utilized to enhance decision-making for lumbar spinal fusion surgery by providing quantitative estimates of outcomes, aiding surgeons in assessing potential benefits and risks for each individual patient. External validation is crucial in CPM to assess generalizability beyond the initial dataset. This ensures performance in diverse populations, reliability and real-world applicability of the results. Therefore, we externally validated the tool for predictability of improvement in oswestry disability index (ODI), back and leg pain (BP, LP).
METHODS
METHODS
Prospective and retrospective data from multicenter registry was obtained. As outcome measure minimum clinically important change was chosen for ODI with ≥ 15-point and ≥ 2-point reduction for numeric rating scales (NRS) for BP and LP 12 months after lumbar fusion for degenerative disease. We externally validate this tool by calculating discrimination and calibration metrics such as intercept, slope, Brier Score, expected/observed ratio, Hosmer-Lemeshow (HL), AUC, sensitivity and specificity.
RESULTS
RESULTS
We included 1115 patients, average age 60.8 ± 12.5 years. For 12-month ODI, area-under-the-curve (AUC) was 0.70, the calibration intercept and slope were 1.01 and 0.84, respectively. For NRS BP, AUC was 0.72, with calibration intercept of 0.97 and slope of 0.87. For NRS LP, AUC was 0.70, with calibration intercept of 0.04 and slope of 0.72. Sensitivity ranged from 0.63 to 0.96, while specificity ranged from 0.15 to 0.68. Lack of fit was found for all three models based on HL testing.
CONCLUSIONS
CONCLUSIONS
Utilizing data from a multinational registry, we externally validate the SCOAP-CERTAIN prediction tool. The model demonstrated fair discrimination and calibration of predicted probabilities, necessitating caution in applying it in clinical practice. We suggest that future CPMs focus on predicting longer-term prognosis for this patient population, emphasizing the significance of robust calibration and thorough reporting.
Identifiants
pubmed: 38987513
doi: 10.1007/s00586-024-08395-3
pii: 10.1007/s00586-024-08395-3
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2024. The Author(s).
Références
Kepler CK et al (2014) National trends in the use of fusion techniques to treat degenerative spondylolisthesis. Spine 39(19):1584–1589. https://doi.org/10.1097/BRS.0000000000000486
doi: 10.1097/BRS.0000000000000486
pubmed: 24979276
Ivar Brox J et al (2003) Randomized clinical trial of lumbar instrumented fusion and cognitive intervention and exercises in patients with chronic low back pain and disc degeneration. Spine 28(17):1913–1921. https://doi.org/10.1097/01.BRS.0000083234.62751.7A
doi: 10.1097/01.BRS.0000083234.62751.7A
pubmed: 12973134
Fairbank J, Frost H, Wilson-MacDonald J, Yu L-M, Barker K, Collins R (2005) Randomised controlled trial to compare surgical stabilisation of the lumbar spine with an intensive rehabilitation programme for patients with chronic low back pain: the MRC spine stabilisation trial. BMJ 330(7502):1233. https://doi.org/10.1136/bmj.38441.620417.8F
doi: 10.1136/bmj.38441.620417.8F
pubmed: 15911537
pmcid: 558090
Birkmeyer NJO et al (2002) Design of the spine patient outcomes research trial (SPORT). Spine 27(12):1361–1372. https://doi.org/10.1097/00007632-200206150-00020
doi: 10.1097/00007632-200206150-00020
pubmed: 12065987
pmcid: 2922028
Weinstein JN et al (2009) Surgical compared with nonoperative treatment for lumbar degenerative spondylolisthesis: four-year results in the spine patient outcomes research trial (SPORT) randomized and observational cohorts. J Bone Jt Surg-Am Vol 91(6):1295–1304. https://doi.org/10.2106/JBJS.H.00913
doi: 10.2106/JBJS.H.00913
Khor S et al (2018) Development and validation of a prediction model for pain and functional outcomes after lumbar spine surgery. JAMA Surg 153(7):634. https://doi.org/10.1001/jamasurg.2018.0072
doi: 10.1001/jamasurg.2018.0072
pubmed: 29516096
pmcid: 5875305
Riley RD et al (2016) External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. https://doi.org/10.1136/bmj.i3140
doi: 10.1136/bmj.i3140
pubmed: 27919934
pmcid: 5137302
Staartjes VE, Kernbach JM (2020) Significance of external validation in clinical machine learning: let loose too early? Spine J Off J North Am Spine Soc 20(7):1159–1160. https://doi.org/10.1016/j.spinee.2020.02.016
doi: 10.1016/j.spinee.2020.02.016
Quddusi A et al (2020) External validation of a prediction model for pain and functional outcome after elective lumbar spinal fusion. Eur Spine J 29(2):374–383. https://doi.org/10.1007/s00586-019-06189-6
doi: 10.1007/s00586-019-06189-6
pubmed: 31641905
Staartjes VE et al (2022) FUSE-ML: development and external validation of a clinical prediction model for mid-term outcomes after lumbar spinal fusion for degenerative disease. Eur Spine J Off Publ Eur Spine Soc Eur Spinal Deform Soc Eur Sect Cerv Spine Res Soc 31(10):2629–2638. https://doi.org/10.1007/s00586-022-07135-9
doi: 10.1007/s00586-022-07135-9
Collins GS, Ogundimu EO, Altman DG (2016) Sample size considerations for the external validation of a multivariable prognostic model: a resampling study: sample size considerations for validating a prognostic model. Stat Med 35(2):214–226. https://doi.org/10.1002/sim.6787
doi: 10.1002/sim.6787
pubmed: 26553135
Mannion AF et al (2022) Development of a mapping function (“crosswalk”) for the conversion of scores between the oswestry disability index (ODI) and the core outcome measures index (COMI). Eur Spine J 31(12):3337–3346. https://doi.org/10.1007/s00586-022-07434-1
doi: 10.1007/s00586-022-07434-1
pubmed: 36329252
Fairbank JCT, Pynsent PB (2000) The oswestry disability index. Spine 25(22):2940–2953. https://doi.org/10.1097/00007632-200011150-00017
doi: 10.1097/00007632-200011150-00017
pubmed: 11074683
Childs JD, Piva SR, Fritz JM (2005) Responsiveness of the numeric pain rating scale in patients with low back pain. Spine 30(11):1331–1334. https://doi.org/10.1097/01.brs.0000164099.92112.29
doi: 10.1097/01.brs.0000164099.92112.29
pubmed: 15928561
Fekete TF, Haschtmann D, Kleinstück FS, Porchet F, Jeszenszky D, Mannion AF (2016) What level of pain are patients happy to live with after surgery for lumbar degenerative disorders? Spine J 16(4):S12–S18. https://doi.org/10.1016/j.spinee.2016.01.180
doi: 10.1016/j.spinee.2016.01.180
pubmed: 26850172
Ostelo RWJG et al (2008) Interpreting change scores for pain and functional status in low back pain: towards international consensus regarding minimal important change. Spine 33(1):90–94. https://doi.org/10.1097/BRS.0b013e31815e3a10
doi: 10.1097/BRS.0b013e31815e3a10
pubmed: 18165753
Templ M, Kowarik A, Alfons A, Prantner B (2019) VIM: visualization and imputation of missing values. https://CRAN.R-project.org/package=VIM Accessed 5 Jan 2020
Staartjes V E, Regli L, Serra C (2022) Machine learning in clinical neuroscience: foundations and applications, In: Acta neurochirurgica supplement, vol 134. Cham: Springer International Publishing https://doi.org/10.1007/978-3-030-85292-4
Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3. https://doi.org/10.1175/1520-0493(1950)078%3c0001:VOFEIT%3e2.0.CO;2
doi: 10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
Van Hoorde K, Van Huffel S, Timmerman D, Bourne T, Van Calster B (2015) A spline-based tool to assess and visualize the calibration of multiclass risk predictions. J Biomed Inform 54:283–293. https://doi.org/10.1016/j.jbi.2014.12.016
doi: 10.1016/j.jbi.2014.12.016
pubmed: 25579635
Hosmer D W, Lemeshow S, Sturdivant R X (2013) Applied logistic regression, In: Wiley series in probability and statistics. Wiley https://doi.org/10.1002/9781118548387
R Core Team (2023) R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria
Senders JT et al (2018) Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg 109:476–486. https://doi.org/10.1016/j.wneu.2017.09.149
doi: 10.1016/j.wneu.2017.09.149
pubmed: 28986230
Ghogawala Z, Barker FG, Benzel EC (2016) Fusion surgery for lumbar spinal stenosis. N Engl J Med 375(6):600–601
pubmed: 27517105
Försth P et al (2016) A randomized, controlled trial of fusion surgery for lumbar spinal stenosis. N Engl J Med 374(15):1413–1423. https://doi.org/10.1056/NEJMoa1513721
doi: 10.1056/NEJMoa1513721
pubmed: 27074066
Staartjes VE, Vergroesen P-PA, Zeilstra DJ, Schröder ML (2018) Identifying subsets of patients with single-level degenerative disc disease for lumbar fusion: the value of prognostic tests in surgical decision making. Spine J 18(4):558–566. https://doi.org/10.1016/j.spinee.2017.08.242
doi: 10.1016/j.spinee.2017.08.242
pubmed: 28890222
Mannion AF, Brox J-I, Fairbank JC (2016) Consensus at last! Long-term results of all randomized controlled trials show that fusion is no better than non-operative care in improving pain and disability in chronic low back pain. Spine J 16(5):588–590. https://doi.org/10.1016/j.spinee.2015.12.001
doi: 10.1016/j.spinee.2015.12.001
pubmed: 27261844
Willems P (2013) Decision making in surgical treatment of chronic low back pain: the performance of prognostic tests to select patients for lumbar spinal fusion. Acta Orthop 84(sup349):1–37. https://doi.org/10.3109/17453674.2012.753565
doi: 10.3109/17453674.2012.753565
Van Hooff ML, Mannion AF, Staub LP, Ostelo RWJG, Fairbank JCT (2016) Determination of the oswestry disability index score equivalent to a “satisfactory symptom state” in patients undergoing surgery for degenerative disorders of the lumbar spine—a spine tango registry-based study. Spine J 16(10):1221–1230. https://doi.org/10.1016/j.spinee.2016.06.010
doi: 10.1016/j.spinee.2016.06.010
pubmed: 27343730
Falavigna A et al (2017) Current status of worldwide use of patient-reported outcome measures (PROMs) in spine care. World Neurosurg 108:328–335. https://doi.org/10.1016/j.wneu.2017.09.002
doi: 10.1016/j.wneu.2017.09.002
pubmed: 28893693
Kim JS et al (2018) Examining the ability of artificial neural networks machine learning models to accurately predict complications following posterior lumbar spine fusion. Spine 43(12):853–860. https://doi.org/10.1097/BRS.0000000000002442
doi: 10.1097/BRS.0000000000002442
pubmed: 29016439
pmcid: 6252089
Ehlers AP et al (2017) Improved risk prediction following surgery using machine learning algorithms. EGEMs Gener Evid Methods Improve Patient Outcomes 5(2):3. https://doi.org/10.13063/2327-9214.1278
doi: 10.13063/2327-9214.1278
Mattei TA, Rehman AA, Teles AR, Aldag JC, Dinh DH, McCall TD (2017) The ‘lumbar fusion outcome score’ (LUFOS): a new practical and surgically oriented grading system for preoperative prediction of surgical outcomes after lumbar spinal fusion in patients with degenerative disc disease and refractory chronic axial low back pain. Neurosurg Rev 40(1):67–81. https://doi.org/10.1007/s10143-016-0751-6
doi: 10.1007/s10143-016-0751-6
pubmed: 27289367
Steinmetz MP, Mroz T (2018) Value of adding predictive clinical decision tools to spine surgery. JAMA Surg. https://doi.org/10.1001/jamasurg.2018.0078
doi: 10.1001/jamasurg.2018.0078
pubmed: 29516083
Kernbach JM, Staartjes VE (2022) Foundations of machine learning-based clinical prediction modeling: part II—generalization and overfitting. machine learning in clinical neuroscience. In: Staartjes VE, Regli L, Serra C (eds) Acta neurochirurgica supplement, vol 134. Springer International Publishing, Cham, pp 15–21. https://doi.org/10.1007/978-3-030-85292-4_3
doi: 10.1007/978-3-030-85292-4_3
Collins GS, Reitsma JB, Altman DG, Moons KGM (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 350(jan07 4):g7594–g7594. https://doi.org/10.1136/bmj.g7594
doi: 10.1136/bmj.g7594
pubmed: 25569120
Staartjes VE, Kernbach JM (2020) Importance of calibration assessment in machine learning-based predictive analytics. J Neurosurg Spine 32(6):985–987
doi: 10.3171/2019.12.SPINE191503
pubmed: 32084640
Staartjes VE, Stienen MN (2019) Data mining in spine surgery: leveraging electronic health records for machine learning and clinical research. Neurospine 16(4):654–656. https://doi.org/10.14245/ns.1938434.217
doi: 10.14245/ns.1938434.217
pubmed: 31905453
pmcid: 6944992
Nagurney JT (2005) The accuracy and completeness of data collected by prospective and retrospective methods. Acad Emerg Med 12(9):884–895. https://doi.org/10.1197/j.aem.2005.04.021
doi: 10.1197/j.aem.2005.04.021
pubmed: 16141025