Personalized pain management: The relationship between clinical relevance and reliability of measurements.
Journal
European journal of pain (London, England)
ISSN: 1532-2149
Titre abrégé: Eur J Pain
Pays: England
ID NLM: 9801774
Informations de publication
Date de publication:
10 2023
10 2023
Historique:
revised:
05
03
2023
received:
11
01
2023
accepted:
08
03
2023
medline:
5
9
2023
pubmed:
24
3
2023
entrez:
23
3
2023
Statut:
ppublish
Résumé
Reliability is a topic in health science in which a critical appraisal of the magnitudes of the measurements is often left aside to favour a formulaic analysis. Furthermore, the relationship between clinical relevance and reliability of measurements is often overlooked. In this context, the aim of the present article is to provide an overview of the design and analysis of reliability studies, the interpretation of the reliability of measurements and its relationship to clinical significance in the context of pain research and management. The article is divided in two sections: the first section contains a step-by-step guide with simple and straightforward recommendations for the design and analysis of a reliability study, with a relevant example involving a commonly used assessment measure in pain research. The second section provides deeper insight about the interpretation of the results of a reliability study and the association between the reliability of measurements and their experimental and clinical relevance. SIGNIFICANCE: Reliability studies quantify the measurement error in experimental or clinical setups and should be interpreted as a continuous outcome. The assessment of measurement error is useful to design and interpret future experimental studies and clinical interventions. Reliability and clinical relevance are inextricably linked, as measurement error should be considered in the interpretation of minimal detectable change and minimal clinically important differences.
Types de publication
Journal Article
Review
Langues
eng
Sous-ensembles de citation
IM
Pagination
1056-1064Informations de copyright
© 2023 European Pain Federation - EFIC ®.
Références
Altman, D. G., & Bland, J. M. (1983). Measurement in medicine: The analysis of method comparison studies. Journal of the Royal Statistical Society: Series D (The Statistician), 32(3), 307-317. https://doi.org/10.2307/2987937
Angst, F., Aeschlimann, A., & Angst, J. (2017). The minimal clinically important difference raised the significance of outcome effects above the statistical level, with methodological implications for future studies. Journal of Clinical Epidemiology, 82, 128-136. https://doi.org/10.1016/j.jclinepi.2016.11.016
Arnold, B. F., Hogan, D. R., Colford, J. M., & Hubbard, A. E. (2011). Simulation methods to estimate design power: An overview for applied research. BMC Medical Research Methodology, 11(1), 94. https://doi.org/10.1186/1471-2288-11-94
Atkinson, G., & Nevill, A. M. (2000). Typical error versus limits of agreement. Sports Medicine, 30(5), 375-381. https://doi.org/10.2165/00007256-200030050-00005
Atkinson, G., & Nevill, A. M. A. (1998). Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Medicine, 26(4), 217-238. https://doi.org/10.2165/00007256-199826040-00002
Barbosa, M. A., Tahara, A. K., Ferreira, I. C., Intelangelo, L., & Barbosa, A. C. (2019). Effects of 8 weeks of masticatory muscles focused endurance exercises on women with oro-facial pain and temporomandibular disorders: A placebo randomised controlled trial. Journal of Oral Rehabilitation, 46(10), 885-894. https://doi.org/10.1111/joor.12823
Bernstein, J. (2016). Not the last word: Inigo Montoya and statistical significance. Clinical Orthopaedics and Related Research, 474(6), 1370-1374. https://doi.org/10.1007/s11999-016-4814-3
Biurrun Manresa, J. A., Fritsche, R., Vuilleumier, P. H., Oehler, C., Mørch, C. D., Arendt-Nielsen, L., Andersen, O. K., & Curatolo, M. (2014). Is the conditioned pain modulation paradigm reliable? A test-retest assessment using the nociceptive withdrawal reflex. PLoS One, 9(6), e100241. https://doi.org/10.1371/journal.pone.0100241
Bland, J. M. (2010). How can I decide the sample size for a repeatability study? https://www-users.york.ac.uk/~mb55/meas/sizerep.htm
Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1(8476), 307-310. https://doi.org/10.1016/S0140-6736(86)90837-8
Bland, J. M., & Altman, D. G. (1996a). Measurement error proportional to the mean. British Medical Journal, 313(7049), 106.
Bland, J. M., & Altman, D. G. (1996b). Statistics notes: Measurement error and correlation coefficients. BMJ, 313(7048), 41-42. https://doi.org/10.1136/bmj.313.7048.41
Bland, J. M., & Altman, D. G. (1996c). Statistics notes: Measurement error. BMJ, 313(7059), 744. https://doi.org/10.1136/bmj.313.7059.744
Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135-160. https://doi.org/10.1191/096228099673819272
Bravo, G., Sene, M., & Arcand, M. (2017). Reliability of health-related quality-of-life assessments made by older adults and significant others for health states of increasing cognitive impairment. Health and Quality of Life Outcomes, 15(1), 4. https://doi.org/10.1186/s12955-016-0579-3
Brownstein, N. C., Louis, T. A., O'Hagan, A., & Pendergast, J. (2019). The role of expert judgment in statistical inference and evidence-based decision-making. American Statistician, 73(1), 56-68. https://doi.org/10.1080/00031305.2018.1529623
Bruton, A., Conway, J. H., & Holgate, S. T. (2000). Reliability: What is it, and how is it measured? Physiotherapy, 86(2), 94-99. https://doi.org/10.1016/S0031-9406(05)61211-4
Caldwell, A. R. (2022). SimplyAgree: An R package and jamovi module for simplifying agreement and reliability analyses. Journal of Open Source Software, 7(71), 4148. https://doi.org/10.21105/joss.04148
Chance, B. L. (2002). Components of statistical thinking and implications for instruction and assessment. Journal of Statistics Education, 10(3). https://doi.org/10.1080/10691898.2002.11910677
Crosby, R. D., Kolotkin, R. L., & Williams, G. R. (2003). Defining clinically meaningful change in health-related quality of life. Journal of Clinical Epidemiology, 56(5), 395-407. https://doi.org/10.1016/S0895-4356(03)00044-1
de Vet, H. C. W., Beckerman, H., Terwee, C. B., Terluin, B., & Bouter, L. M. (2006). Definition of clinical differences. The Journal of Rheumatology, 33(2), 434.
de Vet, H. C. W., & Terwee, C. B. (2010). The minimal detectable change should not replace the minimal important difference. Journal of Clinical Epidemiology, 63(7), 804-805. https://doi.org/10.1016/j.jclinepi.2009.12.015
de Vet, H. C. W., Terwee, C. B., Ostelo, R. W., Beckerman, H., Knol, D. L., & Bouter, L. M. (2006). Minimal changes in health status questionnaires: Distinction between minimally detectable change and minimally important change. Health and Quality of Life Outcomes, 4(1), 54. https://doi.org/10.1186/1477-7525-4-54
Euasobhon, P., Atisook, R., Bumrungchatudom, K., Zinboonyahgoon, N., Saisavoey, N., & Jensen, M. P. (2022). Reliability and responsivity of pain intensity scales in individuals with chronic pain. Pain, 163(12), e1184-e1191. https://doi.org/10.1097/j.pain.0000000000002692
Fleiss, J. L. (1999). The design and analysis of clinical experiments. In The Design and Analysis of Clinical Experiments. John Wiley & Sons, Inc.. https://doi.org/10.1002/9781118032923
Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical methods for rates and proportions. In Statistical Methods for Rates and Proportions. John Wiley & Sons, Inc.. https://doi.org/10.1002/0471445428
Gerke, O., Pedersen, A. K., Debrabant, B., Halekoh, U., & Möller, S. (2022). Sample size determination in method comparison and observer variability studies. Journal of Clinical Monitoring and Computing, 36(5), 1241-1243. https://doi.org/10.1007/s10877-022-00853-x
Gustorff, B., Sycha, T., Lieba-Samal, D., Rolke, R., Treede, R.-D., & Magerl, W. (2013). The pattern and time course of somatosensory changes in the human UVB sunburn model reveal the presence of peripheral and central sensitization. Pain, 154(4), 586-597. https://doi.org/10.1016/j.pain.2012.12.020
Han, O., Tan, H. W., Julious, S., Sutton, L., Jacques, R., Lee, E., Lewis, J., & Walters, S. (2022). A descriptive study of samples sizes used in agreement studies published in the PubMed repository. BMC Medical Research Methodology, 22(1), 242. https://doi.org/10.1186/s12874-022-01723-5
Hopkins, W. G. (2000). Measures of reliability in sports medicine and science. Sports Medicine, 30(1), 1-15. https://doi.org/10.2165/00007256-200030010-00001
Houweling, T. A. W. (2010). Reporting improvement from patient-reported outcome measures: A review. Clinical Chiropractic, 13(1), 15-22. https://doi.org/10.1016/j.clch.2009.12.003
Jaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of health status: Ascertaining the minimal clinically important difference. Controlled Clinical Trials, 10(4), 407-415. https://doi.org/10.1016/0197-2456(89)90005-6
Jan, S.-L., & Shieh, G. (2018). The Bland-Altman range of agreement: Exact interval procedure and sample size determination. Computers in Biology and Medicine, 100, 247-252. https://doi.org/10.1016/j.compbiomed.2018.06.020
Jensen, M. B., Biurrun Manresa, J. A., & Andersen, O. K. (2015). Reliable estimation of nociceptive withdrawal reflex thresholds. Journal of Neuroscience Methods, 253, 110-115. https://doi.org/10.1016/j.jneumeth.2015.06.014
Julious, S. A. (2004). Sample sizes for clinical trials with Normal data. Statistics in Medicine, 23(12), 1921-1986. https://doi.org/10.1002/sim.1783
Kazdin, A. E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical Psychology, 67(3), 332-339. https://doi.org/10.1037/0022-006X.67.3.332
King, M. T. (2011). A point of minimal important difference (MID): A critique of terminology and methods. Expert Review of Pharmacoeconomics & Outcomes Research, 11(2), 171-184. https://doi.org/10.1586/erp.11.9
Koh, R. G., Paul, T. M., Nesovic, K., West, D., Kumbhare, D., & Wilson, R. D. (2022). Reliability and minimal detectable difference of pressure pain thresholds in a pain-free population. British Journal of Pain, 204946372211471. https://doi.org/10.1177/20494637221147185
Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B. J., Hróbjartsson, A., Roberts, C., Shoukri, M., & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64(1), 96-106. https://doi.org/10.1016/j.jclinepi.2010.03.002
Kropmans, T. J. B., Dijkstra, P. U., Stegenga, B., Stewart, R., & de Bont, L. G. M. (1999). Smallest detectable difference in outcome variables related to painful restriction of the temporomandibular joint. Journal of Dental Research, 78(3), 784-789. https://doi.org/10.1177/00220345990780031101
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 1-12. https://doi.org/10.3389/fpsyg.2013.00863
Lakens, D. (2022a). Sample size justification. Collabra: Psychology, 8(1), 33267. https://doi.org/10.1525/collabra.33267
Lakens, D. (2022b). Why P values are not measures of evidence. Trends in Ecology & Evolution, 37(4), 289-290. https://doi.org/10.1016/j.tree.2021.12.006
Lu, M.-J., Zhong, W.-H., Liu, Y.-X., Miao, H.-Z., Li, Y.-C., & Ji, M.-H. (2016). Sample size for assessing agreement between two methods of measurement by Bland−Altman method. The International Journal of Biostatistics, 12(2). https://doi.org/10.1515/ijb-2015-0039
Ludbrook, J. (2010). Confidence in Altman-Bland plots: A critical review of the method of differences. Clinical and Experimental Pharmacology and Physiology, 37(2), 143-149. https://doi.org/10.1111/j.1440-1681.2009.05288.x
Mejuto-Vázquez, M. J., Salom-Moreno, J., Ortega-Santiago, R., Truyols-Domínguez, S., & Fernández-de-las-Peñas, C. (2014). Short-term changes in neck pain, widespread pressure pain sensitivity, and cervical range of motion after the application of trigger point dry needling in patients with acute mechanical neck pain: A randomized clinical trial. Journal of Orthopaedic & Sports Physical Therapy, 44(4), 252-260. https://doi.org/10.2519/jospt.2014.5108
Mokkink, L. B., de Vet, H., Diemeer, S., & Eekhout, I. (2022). Sample size recommendations for studies on reliability and measurement error: An online application based on simulation studies. Health Services and Outcomes Research Methodology. https://doi.org/10.1007/s10742-022-00293-9
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., Bouter, L. M., & de Vet, H. C. W. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63(7), 737-745. https://doi.org/10.1016/j.jclinepi.2010.02.006
Mørch, C. D., Gazerani, P., Nielsen, T. A., & Arendt-Nielsen, L. (2013). The UVB cutaneous inflammatory pain model: A reproducibility study in healthy volunteers. International Journal of Physiology, Pathophysiology and Pharmacology, 5(4), 203-215.
Morrow, J. R., Jr., & Jackson, A. W. (1993). How ‘significant’ is your reliability? Research Quarterly for Exercise and Sport, 64(3), 352-355.
National Academies of Sciences, Engineering, and Medicine; Policy and Global Affairs; Committee on Science, Engineering, Medicine, and Public Policy; Board on Research Data and Information; Division on Engineering and Physical Sciences; Committee on Applied and Theoretical Statistics; Board on Mathematical Sciences and Analytics; Division on Earth and Life Studies; Nuclear and Radiation Studies Board; Division of Behavioral and Social Sciences and Education; Committee on National Statistics; Board on Behavioral, Cognitive, and Sensory Sciences; Committee on Reproducibility and Replicability in Science. (2019). Reproducibility and replicability in science. National Academies Press. https://doi.org/10.17226/25303
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation. Medical Care, 41(5), 582-592. https://doi.org/10.1097/01.MLR.0000062554.74615.4C
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences of the United States of America, 115(11), 2600-2606. https://doi.org/10.1073/pnas.1708274114
O'Hagan, A. (2019). Expert knowledge elicitation: Subjective but scientific. American Statistician, 73(1), 69-81. https://doi.org/10.1080/00031305.2018.1518265
Olofsen, E., Dahan, A., Borsboom, G., & Drummond, G. (2014). Improvements in the application and reporting of advanced Bland-Altman methods of comparison. Journal of Clinical Monitoring and Computing, 29(1), 127-139. https://doi.org/10.1007/s10877-014-9577-3
O'Neill, S., & O'Neill, L. (2015). Improving QST reliability-More raters, tests, or occasions? A multivariate generalizability study. Journal of Pain, 16(5), 454-462. https://doi.org/10.1016/j.jpain.2015.01.476
Ottenbacher, K. J., Johnson, M. B., & Hojem, M. (1988). The significance of clinical change and clinical change of significance: Issues and methods. The American Journal of Occupational Therapy, 42(3), 156-163. https://doi.org/10.5014/ajot.42.3.156
Revicki, D., Hays, R. D., Cella, D., & Sloan, J. (2008). Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. Journal of Clinical Epidemiology, 61(2), 102-109. https://doi.org/10.1016/j.jclinepi.2007.03.012
Schmitt, J. S., & Di Fabio, R. P. (2004). Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria. Journal of Clinical Epidemiology, 57(10), 1008-1018. https://doi.org/10.1016/j.jclinepi.2004.02.007
Schuck, P., & Zwingmann, C. (2003). The smallest real difference as a measure of sensitivity to change: A critical analysis. International Journal of Rehabilitation Research, 26(2), 85-91. https://doi.org/10.1097/00004356-200306000-00002
Schuller, W., Terwee, C. B., Terluin, B., Rohrich, D. C., Ostelo, R. W. J. G., & de Vet, H. C. W. (2022). Responsiveness and minimal important change of the PROMIS pain interference item Bank in Patients Presented in musculoskeletal practice. The Journal of Pain, S1526590022004394, 530-539. https://doi.org/10.1016/j.jpain.2022.10.013
Shechtman, O. (2013). The coefficient of variation as an index of measurement Reliability. In S. A. R. Doi & G. M. Williams (Eds.), Methods of clinical epidemiology (pp. 39-49). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-37131-8_4
Shieh, G. (2014a). Optimal sample sizes for the design of reliability studies: Power consideration. Behavior Research Methods, 46(3), 772-785. https://doi.org/10.3758/s13428-013-0396-0
Shieh, G. (2014b). Sample size requirements for the design of reliability studies: Precision consideration. Behavior Research Methods, 46(3), 808-822. https://doi.org/10.3758/s13428-013-0415-1
Shieh, G. (2018). The appropriateness of Bland-Altman's approximate confidence intervals for limits of agreement. BMC Medical Research Methodology, 18(1), 45. https://doi.org/10.1186/s12874-018-0505-y
Sinatra, R. (2002). Role of COX-2 inhibitors in the evolution of acute pain management. Journal of Pain and Symptom Management, 24(1), S18-S27. https://doi.org/10.1016/S0885-3924(02)00410-4
Sterne, J. A. C., & Smith, G. D. (2001). Sifting the evidence-What's wrong with significance tests? Physical Therapy, 81(8), 1464-1469. https://doi.org/10.1093/ptj/81.8.1464
Terwee, C. B., Roorda, L. D., Knol, D. L., De Boer, M. R., & de Vet, H. C. W. (2009). Linking measurement error to minimal important change of patient-reported outcomes. Journal of Clinical Epidemiology, 62(10), 1062-1067. https://doi.org/10.1016/j.jclinepi.2008.10.011
Tong, C. (2019). Statistical inference enables bad science; statistical thinking enables good science. American Statistician, 73(1), 246-261. https://doi.org/10.1080/00031305.2018.1518264
Turner, D., Schünemann, H. J., Griffith, L. E., Beaton, D. E., Griffiths, A. M., Critch, J. N., & Guyatt, G. H. (2010). The minimal detectable change cannot reliably replace the minimal important difference. Journal of Clinical Epidemiology, 63(1), 28-36. https://doi.org/10.1016/j.jclinepi.2009.01.024
Walter, S. D., Eliasziw, M., & Donner, A. (1998). Sample size and optimal designs for reliability studies. Statistics in Medicine, 17(1), 101-110. https://doi.org/10.1002/(SICI)1097-0258(19980115)17:1<101::AID-SIM727>3.0.CO;2-E
Weir, J. P. J. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of Strength and Conditioning Research, 19(1), 231-240. https://doi.org/10.1519/15184.1
Wyrwich, K. W. (2004). Minimal important difference thresholds and the standard error of measurement: Is there a connection? Journal of Biopharmaceutical Statistics, 14(1), 97-110. https://doi.org/10.1081/BIP-120028508
Wyrwich, K. W., Nienaber, N. A., Tierney, W. M., & Wolinsky, F. D. (1999). Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Medical Care, 37(5), 469-478. https://doi.org/10.1097/00005650-199905000-00006
Wyrwich, K. W., & Norman, G. R. (2022). The challenges inherent with anchor-based approaches to the interpretation of important change in clinical outcome assessments. Quality of Life Research. https://doi.org/10.1007/s11136-022-03297-7