Studying missingness in spinal cord injury data: challenges and impact of data imputation.
Imputation
Missing data
Simulation study
Spinal cord injury
Journal
BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545
Informations de publication
Date de publication:
06 Jan 2024
06 Jan 2024
Historique:
received:
04
07
2023
accepted:
08
12
2023
medline:
7
1
2024
pubmed:
7
1
2024
entrez:
6
1
2024
Statut:
epublish
Résumé
In the last decades, medical research fields studying rare conditions such as spinal cord injury (SCI) have made extensive efforts to collect large-scale data. However, most analysis methods rely on complete data. This is particularly troublesome when studying clinical data as they are prone to missingness. Often, researchers mitigate this problem by removing patients with missing data from the analyses. Less commonly, imputation methods to infer likely values are applied. Our objective was to study how handling missing data influences the results reported, taking the example of SCI registries. We aimed to raise awareness on the effects of missing data and provide guidelines to be applied for future research projects, in SCI research and beyond. Using the Sygen clinical trial data (n = 797), we analyzed the impact of the type of variable in which data is missing, the pattern according to which data is missing, and the imputation strategy (e.g. mean imputation, last observation carried forward, multiple imputation). Our simulations show that mean imputation may lead to results strongly deviating from the underlying expected results. For repeated measures missing at late stages (> = 6 months after injury in this simulation study), carrying the last observation forward seems the preferable option for the imputation. This simulation study could show that a one-size-fit-all imputation strategy falls short in SCI data sets. Data-tailored imputation strategies are required (e.g., characterisation of the missingness pattern, last observation carried forward for repeated measures evolving to a plateau over time). Therefore, systematically reporting the extent, kind and decisions made regarding missing data will be essential to improve the interpretation, transparency, and reproducibility of the research presented.
Sections du résumé
BACKGROUND
BACKGROUND
In the last decades, medical research fields studying rare conditions such as spinal cord injury (SCI) have made extensive efforts to collect large-scale data. However, most analysis methods rely on complete data. This is particularly troublesome when studying clinical data as they are prone to missingness. Often, researchers mitigate this problem by removing patients with missing data from the analyses. Less commonly, imputation methods to infer likely values are applied.
OBJECTIVE
OBJECTIVE
Our objective was to study how handling missing data influences the results reported, taking the example of SCI registries. We aimed to raise awareness on the effects of missing data and provide guidelines to be applied for future research projects, in SCI research and beyond.
METHODS
METHODS
Using the Sygen clinical trial data (n = 797), we analyzed the impact of the type of variable in which data is missing, the pattern according to which data is missing, and the imputation strategy (e.g. mean imputation, last observation carried forward, multiple imputation).
RESULTS
RESULTS
Our simulations show that mean imputation may lead to results strongly deviating from the underlying expected results. For repeated measures missing at late stages (> = 6 months after injury in this simulation study), carrying the last observation forward seems the preferable option for the imputation. This simulation study could show that a one-size-fit-all imputation strategy falls short in SCI data sets.
CONCLUSIONS
CONCLUSIONS
Data-tailored imputation strategies are required (e.g., characterisation of the missingness pattern, last observation carried forward for repeated measures evolving to a plateau over time). Therefore, systematically reporting the extent, kind and decisions made regarding missing data will be essential to improve the interpretation, transparency, and reproducibility of the research presented.
Identifiants
pubmed: 38184529
doi: 10.1186/s12874-023-02125-x
pii: 10.1186/s12874-023-02125-x
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
5Informations de copyright
© 2024. The Author(s).
Références
User S. Home - EMSCI. Accessed March 9, 2023. https://www.emsci.org/ .
Spinal cord injury (SCI) model system. Accessed March 9, 2023. https://msktc.org/about-model-systems/sci .
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
doi: 10.1093/biomet/63.3.581
Newman DA. Missing data: five practical guidelines. Organ Res Methods. 2014;17(4):372–411.
doi: 10.1177/1094428114548590
Hughes RA, Heron J, Sterne JAC, Tilling K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int J Epidemiol. 2019;48(4):1294–304.
pubmed: 30879056
pmcid: 6693809
doi: 10.1093/ije/dyz032
Dziura JD, Post LA, Zhao Q, Fu Z, Peduzzi P. Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J Biol Med. 2013;86(3):343–58.
pubmed: 24058309
pmcid: 3767219
Little RJA, Rubin DB. Statistical analysis with missing data. John Wiley & Sons; 2019.
Simundić AM. Bias in research. Biochem Med. 2013;23(1):12–5.
doi: 10.11613/BM.2013.003
Pedersen AB, Mikkelsen EM, Cronin-Fenton D, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157–66.
pubmed: 28352203
pmcid: 5358992
doi: 10.2147/CLEP.S129785
Li J, Yan XS, Chaudhary D, et al. Imputation of missing values for electronic health record laboratory data. NPJ Digit Med. 2021;4(1):147.
pubmed: 34635760
pmcid: 8505441
doi: 10.1038/s41746-021-00518-0
Smith AC, Albin SR, O’Dell DR, et al. Axial MRI biomarkers of spinal cord damage to predict future walking and motor function: a retrospective study. Spinal Cord. 2021;59(6):693–9.
pubmed: 33024298
doi: 10.1038/s41393-020-00561-w
Belliveau T, Jette AM, Seetharama S, et al. Developing artificial neural network models to predict functioning one year after traumatic spinal cord injury. Arch Phys Med Rehabil. 2016;97(10):1663-1668.e3.
pubmed: 27208647
doi: 10.1016/j.apmr.2016.04.014
Kwon BK, Streijger F, Fallah N, et al. Cerebrospinal fluid biomarkers to stratify injury severity and predict outcome in human traumatic spinal cord injury. J Neurotrauma. 2017;34(3):567–80.
pubmed: 27349274
doi: 10.1089/neu.2016.4435
Stavseth MR, Clausen T, Røislien J. How handling missing data may impact conclusions: a comparison of six different imputation methods for categorical questionnaire data. SAGE Open Med. 2019;7:2050312118822912.
pubmed: 30671242
pmcid: 6329020
doi: 10.1177/2050312118822912
Gorelick MH. Bias arising from missing data in predictive models. J Clin Epidemiol. 2006;59(10):1115–23.
pubmed: 16980153
doi: 10.1016/j.jclinepi.2004.11.029
Li P, Stuart EA, Allison DB. Multiple imputation: a flexible tool for handling missing data. JAMA. 2015;314(18):1966–7.
pubmed: 26547468
pmcid: 4638176
doi: 10.1001/jama.2015.15281
Javanbakht M, Lin J, Ragsdale A, Kim S, Siminski S, Gorbach P. Comparing single and multiple imputation strategies for harmonizing substance use data across HIV-related cohort studies. BMC Med Res Methodol. 2022;22(1):90.
pubmed: 35369872
pmcid: 8978400
doi: 10.1186/s12874-022-01554-4
Ye W, Zhang L, Zhang W, Wu X, Yi D, Wu Y. A comparison of single imputation and multiple imputation methods for missing data in different oncogene expression profiles. Biostat Epidemiol. 2022;6(1):113–27.
doi: 10.1080/24709360.2021.2023805
Alizadeh A, Dyck SM, Karimi-Abdolrezaee S. Traumatic spinal cord injury: an overview of pathophysiology, models and acute injury mechanisms. Front Neurol. 2019;10:282.
pubmed: 30967837
pmcid: 6439316
doi: 10.3389/fneur.2019.00282
Jørgensen HS, Nakayama H, Raaschou HO, Olsen TS. Recovery of walking function in stroke patients: the Copenhagen stroke study. Arch Phys Med Rehabil. 1995;76(1):27–32.
pubmed: 7811170
doi: 10.1016/S0003-9993(95)80038-7
Carroll EL, Outtrim JG, Forsyth F, et al. Mild traumatic brain injury recovery: a growth curve modelling analysis over 2 years. J Neurol. 2020;267(11):3223–34.
pubmed: 32535683
pmcid: 7578150
doi: 10.1007/s00415-020-09979-x
Leone MA, Bonissoni S, Collimedaglia L, et al. Factors predicting incomplete recovery from relapses in multiple sclerosis: a prospective study. Mult Scler. 2008;14(4):485–93.
pubmed: 18208889
doi: 10.1177/1352458507084650
Geisler FH, Coleman WP, Grieco G, Poonian D, Sygen Study Group. The Sygen multicenter acute spinal cord injury study. Spine. 2001;26(24 Suppl):S87–98.
pubmed: 11805614
doi: 10.1097/00007632-200112151-00015
Geisler FH, Coleman WP, Grieco G, Poonian D, Sygen Study Group. Recruitment and early treatment in a multicenter study of acute spinal cord injury. Spine. 2001;26(24 Suppl):S58–67.
pubmed: 11805612
doi: 10.1097/00007632-200112151-00013
Geisler FH, Coleman WP, Grieco G, Poonian D, Sygen Study Group. Measurements and recovery patterns in a multicenter study of acute spinal cord injury. Spine. 2001;26(24 Suppl):S68–86.
pubmed: 11805613
doi: 10.1097/00007632-200112151-00014
Bourguignon L, Tong B, Geisler F, et al. International surveillance study in acute spinal cord injury confirms viability of multinational clinical trials. BMC Med. 2022;20(1):225.
pubmed: 35705947
pmcid: 9202190
doi: 10.1186/s12916-022-02395-0
Bracken MB, Shepard MJ, Collins WF, et al. A randomized, controlled trial of methylprednisolone or naloxone in the treatment of acute spinal-cord injury. Results of the second national acute spinal cord injury study. N Engl J Med. 1990;322(20):1405–11.
pubmed: 2278545
doi: 10.1056/NEJM199005173222001
Geisler FH, Dorsey FC, Coleman WP. Recovery of motor function after spinal-cord injury--a randomized, placebo-controlled trial with GM-1 ganglioside. N Engl J Med. 1991;324(26):1829–38.
pubmed: 2041549
doi: 10.1056/NEJM199106273242601
Rupp R, Biering-Sørensen F, Burns SP, et al. International standards for neurological classification of spinal cord injury: revised 2019. Top Spinal Cord Inj Rehabil. 2021;27(2):1–22.
pubmed: 34108832
pmcid: 8152171
doi: 10.46292/sci2702-1
Roberts TT, Leonard GR, Cepela DJ. Classifications in brief: American spinal injury association (ASIA) impairment scale. Clin Orthop Relat Res. 2017;475(5):1499–504.
pubmed: 27815685
doi: 10.1007/s11999-016-5133-4
Yoke CW, Khalid ZM. Comparison of multiple imputation and complete-case in a simulated longitudinal data with missing covariate. AIP Conf Proc. 2014;1605(1):918–22.
doi: 10.1063/1.4887712
Schuld C, Franz S, Brüggemann K, et al. International standards for neurological classification of spinal cord injury: impact of the revised worksheet (revision 02/13) on classification performance. J Spinal Cord Med. 2016;39(5):504–12.
pubmed: 27301061
pmcid: 5020584
doi: 10.1080/10790268.2016.1180831
Donders ART, van der Heijden GJMG, Stijnen T, Moons KGM. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087–91.
pubmed: 16980149
doi: 10.1016/j.jclinepi.2006.01.014
Shao J, Zhong B. Last observation carry-forward and last observation analysis. Stat Med. 2003;22(15):2429–41.
pubmed: 12872300
doi: 10.1002/sim.1519
Kucher K, Johns D, Maier D, et al. First-in-man intrathecal application of neurite growth-promoting anti-Nogo-a antibodies in acute spinal cord injury. Neurorehabil Neural Repair. 2018;32(6–7):578–89.
pubmed: 29869587
doi: 10.1177/1545968318776371
Weisberg S. Applied linear regression. John Wiley & Sons; 2005.
doi: 10.1002/0471704091
Peterson L. K-nearest neighbor. Scholarpedia J. 2009;4(2):1883.
doi: 10.4249/scholarpedia.1883
Steinwart I, Christmann A. Support vector machines. Springer Science & Business Media; 2008.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
doi: 10.1023/A:1010933404324
Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning. ICML ‘06. Association for Computing Machinery; 2006:161–168.
Graham JW, Olchowski AE, Gilreath TD. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci. 2007;8(3):206–13.
pubmed: 17549635
doi: 10.1007/s11121-007-0070-9
van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45:1–67.
doi: 10.18637/jss.v045.i03
Miles A. Obtaining predictions from models fit to multiply imputed data. Sociol Methods Res. 2016;45(1):175–85.
doi: 10.1177/0049124115610345
Test K–S. The concise encyclopedia of statistics. New York: Springer; 2008. p. 283–7.
Little RJA. A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc. 1988;83(404):1198–202.
doi: 10.1080/01621459.1988.10478722
Tierney NJ, Cook DH. Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations. arXiv [statCO]. Published online September 7, 2018. http://arxiv.org/abs/1809.02264 .
Ayilara OF, Zhang L, Sajobi TT, Sawatzky R, Bohm E, Lix LM. Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual Life Outcomes. 2019;17(1):106.
pubmed: 31221151
pmcid: 6585083
doi: 10.1186/s12955-019-1181-2
Goldberg SB, Bolt DM, Davidson RJ. Data missing not at random in Mobile Health Research: assessment of the problem and a case for sensitivity analyses. J Med Internet Res. 2021;23(6):e26749.
pubmed: 34128810
pmcid: 8277392
doi: 10.2196/26749
Bourguignon L, Vo AK, Tong B, et al. Natural progression of routine laboratory markers after spinal trauma: a longitudinal, Multi-Cohort Study. J Neurotrauma. 2021;38(15):2151–61.
pubmed: 33882712
pmcid: 8309438
doi: 10.1089/neu.2021.0012
Tong B, Jutzeler CR, Cragg JJ, et al. Serum albumin predicts long-term neurological outcomes after acute spinal cord injury. Neurorehabil Neural Repair. 2018;32(1):7–17.
pubmed: 29276840
doi: 10.1177/1545968317746781
Torres-Espín A, Haefeli J, Ehsanian R, et al. Topological network analysis of patient similarity for precision management of acute blood pressure in spinal cord injury. Elife. 2021;10 https://doi.org/10.7554/eLife.68015 .
Fan G, Yang S, Liu H, et al. Machine learning-based prediction of prolonged intensive care unit stay for critical patients with spinal cord injury. Spine. 2022;47(9):E390–8.
pubmed: 34690328
doi: 10.1097/BRS.0000000000004267
Scivoletto G, Tamburella F, Laurenza L, Molinari M. Distribution-based estimates of clinically significant changes in the international standards for neurological classification of spinal cord injury motor and sensory scores. Eur J Phys Rehabil Med. 2013;49(3):373–84.
pubmed: 23486305
Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–6.
pubmed: 23741561
pmcid: 3668100
doi: 10.4097/kjae.2013.64.5.402
Lachin JM. Fallacies of last observation carried forward analyses. Clin Trials. 2016;13(2):161–8.
pubmed: 26400875
doi: 10.1177/1740774515602688
Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test (Madr). 2009;18(1):1–43.
pubmed: 21218187
pmcid: 3016756
doi: 10.1007/s11749-009-0138-x
Wu Y, Lyons TJ, Saunders KEA. Deriving information from missing data: implications for mood prediction. Published online; 2020. https://doi.org/10.48550/ARXIV.2006.15030 .
doi: 10.48550/ARXIV.2006.15030
Fox-Wasylyshyn SM, El-Masri MM. Handling missing data in self-report measures. Res Nurs Health. 2005;28(6):488–95.
pubmed: 16287052
doi: 10.1002/nur.20100
van Buuren S. Flexible imputation of missing data, Second Edition. 2nd ed. Chapman & Hall/CRC; 2021.
Noonan VK, Kwon BK, Soril L, et al. The Rick Hansen spinal cord injury registry (RHSCIR): a national patient-registry. Spinal Cord. 2012;50(1):22–7.
pubmed: 22042297
doi: 10.1038/sc.2011.109
Yue JK, Vassar MJ, Lingsma HF, et al. Transforming research and clinical knowledge in traumatic brain injury pilot: multicenter implementation of the common data elements for traumatic brain injury. J Neurotrauma. 2013;30(22):1831–44.
pubmed: 23815563
pmcid: 3814815
doi: 10.1089/neu.2013.2970