Data fusion for predicting long-term program impacts.

Oregon Health Insurance Experiment data fusion health insurance multiple imputation surrogate outcomes

Journal

Statistics in medicine
ISSN: 1097-0258
Titre abrégé: Stat Med
Pays: England
ID NLM: 8215016

Informations de publication

Date de publication:
18 Jun 2024
Historique:
revised: 01 05 2024
received: 27 11 2023
accepted: 06 06 2024
medline: 19 6 2024
pubmed: 19 6 2024
entrez: 18 6 2024
Statut: aheadofprint

Résumé

Policymakers often require information on programs' long-term impacts that is not available when decisions are made. For example, while rigorous evidence from the Oregon Health Insurance Experiment (OHIE) shows that having health insurance influences short-term health and financial measures, the impact on long-term outcomes, such as mortality, will not be known for many years following the program's implementation. We demonstrate how data fusion methods may be used address the problem of missing final outcomes and predict long-run impacts of interventions before the requisite data are available. We implement this method by concatenating data on an intervention (such as the OHIE) with auxiliary long-term data and then imputing missing long-term outcomes using short-term surrogate outcomes while approximating uncertainty with replication methods. We use simulations to examine the performance of the methodology and apply the method in a case study. Specifically, we fuse data on the OHIE with data from the National Longitudinal Mortality Study and estimate that being eligible to apply for subsidized health insurance will lead to a statistically significant improvement in long-term mortality.

Identifiants

pubmed: 38890124
doi: 10.1002/sim.10147
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIA NIH HHS
ID : R21AG058123
Pays : United States

Informations de copyright

© 2024 John Wiley & Sons Ltd.

Références

Athey S, Chetty R, Imbens GW, Kang H. The Surrogate Index: Combining Short‐Term Proxies to Estimate Long‐Term Treatment Effects more Rapidly and Precisely. NBER Working Paper 26463. Cambridge, MA: National Bureau of Economic Research; 2019.
Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Stat Med. 1989;8(4):431‐440.
Cummings SR, Rubin SM, Oster G. The cost‐effectiveness of counseling smokers to quit. JAMA. 1989;261(1):75‐79.
Rubin DB. Statistical matching using file concatenation with adjusted weights and multiple imputations. J Bus Econ Stat. 1986;4(1):87‐94.
Van Hattum P, Hoijtink H. The proof of the pudding is in the eating. Data fusion: an application in marketing. J Database Mark Cust Strategy Manag. 2008;15(4):267‐284.
Kaplan D, McCarty AT. Data fusion with international large scale assessments: a case study using the OECD PISA and TALIS surveys. Large Scale Assess Educ. 2013;1(1):1‐26.
Gilula Z, McCulloch RE, Rossi PE. A direct approach to data fusion. J Market Res. 2006;43(1):73‐83.
Reiter JP. Bayesian finite population imputation for data fusion. Stat Sin. 2012;22(2):795.
Qian Y, Xie H. Which brand purchasers are lost to counterfeiters? An application of new data fusion approaches. Mark Sci. 2014;33(3):437‐448.
Fosdick BK, DeYoreo M, Reiter JP. Categorical data fusion using auxiliary information. Ann Appl Stat. 2016;10(4):1907‐1929.
Schifeling T, Reiter JP, DeYoreo M. Data fusion for correcting measurement errors. J Surv Stat Methodol. 2019;7(2):175‐200.
Finkelstein A, Taubman S, Wright B, et al. The Oregon health insurance experiment: evidence from the first year. Q J Econ. 2012;127(3):1057‐1106.
Baicker K, Taubman SL, Allen HL, et al. The Oregon experiment—effects of Medicaid on clinical outcomes. N Engl J Med. 2013;368(18):1713‐1722.
Finkelstein A, McKnight R. What did Medicare do? The initial impact of Medicare on mortality and out of pocket medical spending. J Public Econ. 2008;92(7):1644‐1668. doi:10.1016/j.jpubeco.2007.10.005
Wang X, Cai T, Tian L, Bourgeois F, Parast L. Quantifying the feasibility of shortening clinical trial duration using surrogate markers. Stat Med. 2021;40(28):6321‐6343.
Berger VW. Does the prentice criterion validate surrogate endpoints? Stat Med. 2004;23(10):1571‐1578.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41‐55.
Little RJ, Vartivarian S. On weighting the rates in non‐response weights. Stat Med. 2003;22(9):1589‐1599.
Robbins MW, Ghosh‐Dastidar B, Ramchand R. Blending probability and nonprobability samples with applications to a survey of military caregivers. J Surv Stat Methodol. 2021;9(5):1114‐1145.
Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd ed. Hoboken, NJ: John Wiley & Sons; 2002.
Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol. 1986;51(6):1173.
Rubin DB. Multiple imputation after 18+ years. J Am Stat Assoc. 1996;91:473‐489.
Robbins MW. Joint imputation of general data. J Surv Stat Methodol. 2024;12(1):183‐210.
Robbins M. gerbil: Generalized Efficient Regression‐Based Imputation with Latent Processes. R package version 0.1.5. 2022.
Van Buuren S, Brand JPL, Groothuis‐Oudshoorn CGM, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006;76:1049‐1064.
Van Buuren S, Groothuis‐Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2010;45:1‐68.
Su YS, Yajima M, Gelman AE, Hill J. Multiple imputation with diagnostics (mi) in R: opening windows into the black box. J Stat Softw. 2011;45(2):1‐31.
Raghunathan TE, Solenberger PW, Van Hoewyk J. IVEware: Imputation and Variance Estimation Software. Ann Arbor, MI: Survey Methodology Program, Survey Research Center, Institute for Social Research, University of Michigan; 2002.
Raghunathan TE, Reiter JP, Rubin DB. Multiple imputation for statistical disclosure limitation. J Off Stat. 2003;19(1):1.
Little RJ. Calibrated Bayes, an inferential paradigm for official statistics in the era of big data. Stat J IAOS. 2015;31(4):555‐563.
Raab GM, Nowok B, Dibben C. Practical data synthesis for large samples. J Priv Confid. 2016;7(3):67‐97.
Shao J, Wu CJ. A general theory for jackknife variance estimation. Ann Stat. 1989;17:1176‐1197.
Kott PS. The delete‐a‐group jackknife. J Off Stat. 2001;17(4):521.
Rao JN, Shao J. Jackknife variance estimation with survey data under hot deck imputation. Biometrika. 1992;79(4):811‐822.
Righi P, Falorsi S, Fasulo A. A modified extended delete a group jackknife variance estimator under random hot deck imputation in business surveys. Contributions to Sampling Statistics. Cham: Springer; 2014:219‐233.
Robbins MW, Burgette L, Bauhoff S. Resampling methods with imputed data. arXiv preprint arXiv:2311.13815, 2023.
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Boca Raton, FL: CRC Press; 1994.
Schomaker M, Heumann C. Bootstrap inference when using multiple imputation. Stat Med. 2018;37(14):2252‐2266.
Taubman SL, Allen HL, Wright BJ, Baicker K, Finkelstein AN. Medicaid increases emergency‐department use: evidence from Oregon's health insurance experiment. Science. 2014;343(6168):263‐268.
Sorlie PD, Backlund E, Keller JB. US mortality by economic, demographic, and social characteristics: the National Longitudinal Mortality Study. Am J Public Health. 1995;85(7):949‐956.
National Longitudinal Mortality Study. Public Use Dataset. Washington, DC: United States Census Bureau; 2014. https://www.census.gov/topics/research/nlms.html
Robbins MW. The utility of nonparametric transformations for imputation of survey data. J Off Stat. 2014;30(4):675‐700.
Lipsitch M, Tchetgen ET, Cohen T. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology. 2010;21(3):383‐388.
VanderWeele TJ. Surrogate measures and consistent surrogates. Biometrics. 2013;69(3):561‐565.
Burgette LF, Reiter JP. Nonparametric Bayesian multiple imputation for missing data due to mid‐study switching of measurement methods. J Am Stat Assoc. 2012;107(498):439‐449.
Athey S, Chetty R, Imbens G. Combining experimental and observational data to estimate treatment effects on long term outcomes. arXiv preprint arXiv:2006.09676, 2020.
Stefanski LA, Boos DD. The calculus of M‐estimation. Am Stat. 2002;56(1):29‐38.
Cole SR, Edwards JK, Breskin A, et al. Illustration of 2 fusion designs and estimators. Am J Epidemiol. 2023;192(3):467‐474.
Zivich PN, Ross RK, Shook‐Sa BE, Cole SR, Edwards JK. Empirical sandwich variance estimator for iterated conditional expectation g‐computation. arXiv preprint arXiv:2306.10976, 2023.

Auteurs

Sebastian Bauhoff (S)

School of Public Health, Harvard University, Cambridge, Massachusetts.

Lane Burgette (L)

RAND, Pittsburgh, Pennsylvania.

Classifications MeSH