Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.
Clinical dataset
Imputation methods
Mechanism of missingness
Missing ratio
Missing values
Pattern of missingness
Simulation study
Journal
BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545
Informations de publication
Date de publication:
28 Aug 2024
28 Aug 2024
Historique:
received:
06
04
2024
accepted:
19
08
2024
medline:
31
8
2024
pubmed:
31
8
2024
entrez:
28
8
2024
Statut:
epublish
Résumé
Comprehending the research dataset is crucial for obtaining reliable and valid outcomes. Health analysts must have a deep comprehension of the data being analyzed. This comprehension allows them to suggest practical solutions for handling missing data, in a clinical data source. Accurate handling of missing values is critical for producing precise estimates and making informed decisions, especially in crucial areas like clinical research. With data's increasing diversity and complexity, numerous scholars have developed a range of imputation techniques. To address this, we conducted a systematic review to introduce various imputation techniques based on tabular dataset characteristics, including the mechanism, pattern, and ratio of missingness, to identify the most appropriate imputation methods in the healthcare field. We searched four information databases namely PubMed, Web of Science, Scopus, and IEEE Xplore, for articles published up to September 20, 2023, that discussed imputation methods for addressing missing values in a clinically structured dataset. Our investigation of selected articles focused on four key aspects: the mechanism, pattern, ratio of missingness, and various imputation strategies. By synthesizing insights from these perspectives, we constructed an evidence map to recommend suitable imputation methods for handling missing values in a tabular dataset. Out of 2955 articles, 58 were included in the analysis. The findings from the development of the evidence map, based on the structure of the missing values and the types of imputation methods used in the extracted items from these studies, revealed that 45% of the studies employed conventional statistical methods, 31% utilized machine learning and deep learning methods, and 24% applied hybrid imputation techniques for handling missing values. Considering the structure and characteristics of missing values in a clinical dataset is essential for choosing the most appropriate data imputation technique, especially within conventional statistical methods. Accurately estimating missing values to reflect reality enhances the likelihood of obtaining high-quality and reusable data, contributing significantly to precise medical decision-making processes. Performing this review study creates a guideline for choosing the most appropriate imputation methods in data preprocessing stages to perform analytical processes on structured clinical datasets.
Sections du résumé
BACKGROUND AND OBJECTIVES
OBJECTIVE
Comprehending the research dataset is crucial for obtaining reliable and valid outcomes. Health analysts must have a deep comprehension of the data being analyzed. This comprehension allows them to suggest practical solutions for handling missing data, in a clinical data source. Accurate handling of missing values is critical for producing precise estimates and making informed decisions, especially in crucial areas like clinical research. With data's increasing diversity and complexity, numerous scholars have developed a range of imputation techniques. To address this, we conducted a systematic review to introduce various imputation techniques based on tabular dataset characteristics, including the mechanism, pattern, and ratio of missingness, to identify the most appropriate imputation methods in the healthcare field.
MATERIALS AND METHODS
METHODS
We searched four information databases namely PubMed, Web of Science, Scopus, and IEEE Xplore, for articles published up to September 20, 2023, that discussed imputation methods for addressing missing values in a clinically structured dataset. Our investigation of selected articles focused on four key aspects: the mechanism, pattern, ratio of missingness, and various imputation strategies. By synthesizing insights from these perspectives, we constructed an evidence map to recommend suitable imputation methods for handling missing values in a tabular dataset.
RESULTS
RESULTS
Out of 2955 articles, 58 were included in the analysis. The findings from the development of the evidence map, based on the structure of the missing values and the types of imputation methods used in the extracted items from these studies, revealed that 45% of the studies employed conventional statistical methods, 31% utilized machine learning and deep learning methods, and 24% applied hybrid imputation techniques for handling missing values.
CONCLUSION
CONCLUSIONS
Considering the structure and characteristics of missing values in a clinical dataset is essential for choosing the most appropriate data imputation technique, especially within conventional statistical methods. Accurately estimating missing values to reflect reality enhances the likelihood of obtaining high-quality and reusable data, contributing significantly to precise medical decision-making processes. Performing this review study creates a guideline for choosing the most appropriate imputation methods in data preprocessing stages to perform analytical processes on structured clinical datasets.
Identifiants
pubmed: 39198744
doi: 10.1186/s12874-024-02310-6
pii: 10.1186/s12874-024-02310-6
doi:
Types de publication
Journal Article
Systematic Review
Langues
eng
Sous-ensembles de citation
IM
Pagination
188Informations de copyright
© 2024. The Author(s).
Références
Little RJ, Rubin DB. Statistical Analysis with Missing Data, vol. 793. Hoboken, NJ, USA: Wiley; 2019.
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
doi: 10.1093/biomet/63.3.581
Galimard JE, Chevret S, Protopopescu C, Resche-Rigon M. A multiple imputation approach for MNAR mechanisms compatible with Heckman’s model. Stat Med. 2016;35(17):2907–20.
pubmed: 26893215
doi: 10.1002/sim.6902
Miettinen OS. Theoretical epidemiology: principles of occurrence research in medicine. In Theoretical epidemiology: principles of occurrence research in medicine 1985 (pp. xxii-359).
Humphries M. Missing Data & How to Deal: an overview of missing data. Popul Res Cent. 2013; 45.
Li T, Hutfless S, Scharfstein DO, Daniels MJ, Hogan JW, Little RJA, et al. Standards should be applied in the prevention and handling of missing data for patient-centered outcomes research: a systematic review and expert consensus. J Clin Epidemiol. 2014;67:15–32. https://doi.org/10.1016/j.jclinepi.2013.08.013 .
doi: 10.1016/j.jclinepi.2013.08.013
pubmed: 24262770
pmcid: 4631258
Suthar B, Patel H, Goswami A. A survey: classification of imputation methods in data mining. Int J Emerg Technol Adv Eng. 2012;2(1):309–12.
Graham JW, Cumsille PE, Elek‐Fisk E. Methods for handling missing data. Handbook of psychology. 2003:87–114.
Buuren SV. Flexible Imputation of Missing Data. Chapman & Hall CRC. 2018. https://doi.org/10.1201/9780429492259 .
doi: 10.1201/9780429492259
Fan J, Han F, Liu H. Challenges of big data analysis. Natl Sci Rev. 2014;1(2):293–314.
pubmed: 25419469
doi: 10.1093/nsr/nwt032
Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj. 2009;338.
Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of random forest and parametric imputation models for imputing missing data using mice: a caliber study. Am J Epidemiol 2014; 179:764–74? https://doi.org/10.1093/aje/kwt312 .
Palanivinayagam A, Damaševičius R. Effective Handling of Missing Values in Datasets for Classification Using Machine Learning Methods. Information. 2023;14(2):92.
doi: 10.3390/info14020092
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–5.
pubmed: 11395428
doi: 10.1093/bioinformatics/17.6.520
Luis J, Gomez S, Vidal ARF, Verleysen M. K nearest neighbors with mutual information for simultaneous classification and missing data imputation. Neurocomputing. 2009;72(7–9):1483–93.
Khan SI, Hoque AS. SICE: an improved missing data imputation technique. Journal of Big Data. 2020;7(1):1–21.
doi: 10.1186/s40537-020-00313-w
Jain R, Xu W. Dynamic model updating (DMU) approach for statistical learning model building with missing data. BMC Bioinformatics. 2021;22(1):1–5.
doi: 10.1186/s12859-021-04138-z
Sun Y, Li J, Xu Y, Zhang T, Wang X. Deep learning versus conventional methods for missing data imputation: A review and comparative study. Expert Systems with Applications. 2023:120201
Sherwood B, Wang L, Zhou XH. Weighted quantile regression for analyzing health care cost data with missing covariates. Stat Med. 2013;32(28):4967–79.
pubmed: 23836597
doi: 10.1002/sim.5883
Crambes C, Henchiri Y. Regression imputation in the functional linear model with missing values in the response. Journal of Statistical Planning and Inference. 2019;201:103–19.
doi: 10.1016/j.jspi.2018.12.004
Andridge RR, Little RJ. A review of hot deck imputation for survey non-response. Int Stat Rev. 2010;78(1):40–64.
pubmed: 21743766
pmcid: 3130338
doi: 10.1111/j.1751-5823.2010.00103.x
Sullivan D, Andridge R. A hot deck imputation procedure for multiply imputing nonignorable missing data: The proxy pattern-mixture hot deck. Comput Stat Data Anal. 2015;82:173–85.
doi: 10.1016/j.csda.2014.09.008
Delalleau O, Courville A, Bengio Y. Efficient EM training of Gaussian mixtures with missing data. arXiv preprint arXiv:1209.0521 . 2012 Sep 4.
Pelckmans K, De Brabanter J, Suykens JA, De Moor B. Handling missing values in support vector machine classifiers. Neural Netw. 2005;18(5–6):684–92.
pubmed: 16111866
doi: 10.1016/j.neunet.2005.06.025
Twala B. An empirical comparison of techniques for handling incomplete data using decision trees. Appl Artif Intell. 2009;23(5):373–405.
doi: 10.1080/08839510902872223
Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach Learn. 1999;36:105–39.
doi: 10.1023/A:1007515423169
Whitehead M, Yaeger L. Sentiment mining using ensemble classification models. InInnovations and advances in computer sciences and engineering 2010 (pp. 509–514). Springer Netherlands.
Gupta A, Lam MS. Estimating missing values using neural networks. Journal of the Operational Research Society. 1996;47:229–38.
doi: 10.1057/jors.1996.21
Sharpe PK, Solly RJ. Dealing with missing values in neural network-based diagnostic systems. Neural Comput Appl. 1995;3:73–7.
doi: 10.1007/BF01421959
Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group* T. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Annals of internal medicine. 2009; 151(4):264–9.
Liu N, Chee ML, Niu C, Pek PP, Siddiqui FJ, Ansah JP, Matchar DB, Lam SS, Abdullah HR, Chan A, Malhotra R. Coronavirus disease 2019 (COVID-19): an evidence map of medical literature. BMC Med Res Methodol. 2020;20:1–1.
doi: 10.1186/s12874-020-01059-y
Abassi RA, Msengwa AS. Classification of breast cancer recurrence based on imputed data: a simulation study. BioData Mining. 2022;15(1):30.
pubmed: 36476234
pmcid: 9727846
doi: 10.1186/s13040-022-00316-8
Ahmad A, Mohamed HH. The enhancement of linear regression algorithm in handling missing data for medical data set.
Setiawan NA, Venkatachalam PA, Ahmad Fadzil MH. A knowledge discovery from incomplete coronary artery disease datasets using a rough set. International Journal of Medical Engineering and Informatics. 2011;3(1):60–77.
doi: 10.1504/IJMEI.2011.039077
Alabadla M, Sidi F, Ishak I, H, Affendey L, Hamdan H. A. ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation. Journal of Advances in Information Technology. 2022; 13(5): 470–476. https://doi.org/10.12720/jait.13.5.470-476
Alade OA, Selamat A, Sallehuddin R. The Effects of Missing Data Characteristics on the Choice of Imputation Techniques. Vietnam Journal of Computer Science. 2020;7(02):161–77.
doi: 10.1142/S2196888820500098
Algarni A, Ragab M, Alamri W, Mostafa SM. Towards Improving Predictive Statistical Learning Model Accuracy by Enhancing Learning Technique. Comput Syst Sci Eng. 2022;42(1):303–18.
doi: 10.32604/csse.2022.022152
Almasinejad P, Golabpour A, Mollakhalili Meybodi MR, Mirzaie K, Khosravi A. A dynamic model for imputing missing medical data: a multiobjective particle swarm optimization algorithm. J Healthcare Eng. 2021; 2021.
Alsaber A, Al-Herz A, Pan J, AL‐Sultan AT, Mishra D, KRRD Group. Handling missing data in a rheumatoid arthritis registry using a random forest approach. Int J Rheumatic Dis. 2021;24(10):1282–93.
doi: 10.1111/1756-185X.14203
Batra S, Khurana R, Khan MZ, Boulila W, Koubaa A, Srivastava P. A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records. Entropy. 2022;24(4):533.
pubmed: 35455196
pmcid: 9030272
doi: 10.3390/e24040533
Beaulieu-Jones BK, Lavage DR, Snyder JW, Moore JH, Pendergrass SA, Bauer CR. Characterizing and managing missing structured data in electronic health records: data analysis. JMIR Med Inform. 2018;6(1): e8960.
doi: 10.2196/medinform.8960
Beesley LJ, Taylor JM. Accounting for not-at-random missingness through imputation stacking. Stat Med. 2021;40(27):6118–32.
pubmed: 34459011
pmcid: 8595557
doi: 10.1002/sim.9174
Bernardini M, Doinychko A, Romeo L, Frontoni E, Amini MR. a novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets. Comput Biol Med. 2023;163: 107188.
pubmed: 37393785
doi: 10.1016/j.compbiomed.2023.107188
Burgette LF, Reiter JP. Multiple imputation for missing data via sequential regression trees. Am J Epidemiol. 2010;172(9):1070–6.
pubmed: 20841346
doi: 10.1093/aje/kwq260
Carreras G, Miccinesi G, Wilcock A, Preston N, Nieboer D, Deliens L, Groenvold M, Lunder U, van der Heide A, Baccini M. Missing not at random in end-of-life care studies: multiple imputation and sensitivity analysis on data from the ACTION study. BMC Med Res Methodol. 2021;21:1–2.
doi: 10.1186/s12874-020-01180-y
Casiraghi E, Wong R, Hall M, Coleman B, Notaro M, Evans MD, Tronieri JS, Blau H, Laraway B, Callahan TJ, Chan LE. A method for comparing multiple imputation techniques: A case study on the US national COVID cohort collaborative. J Biomed Inform. 2023;139: 104295.
pubmed: 36716983
pmcid: 10683778
doi: 10.1016/j.jbi.2023.104295
Chen J, Hunter S, Kisfalvi K, Lirio RA. A hybrid approach of handling missing data under different missing data mechanisms: VISIBLE 1 and VARSITY trials for ulcerative colitis. Contemp Clin Trials. 2021;100: 106226.
pubmed: 33238200
doi: 10.1016/j.cct.2020.106226
Cheng CH, Chang JR, Huang HH. A novel weighted distance threshold method for handling medical missing values. Comput Biol Med. 2020;122: 103824.
pubmed: 32658729
doi: 10.1016/j.compbiomed.2020.103824
Cheng CH, Huang SF. A novel clustering-based purity and distance imputation for handling medical data with missing values. Soft Comput. 2021;25(17):11781–801.
doi: 10.1007/s00500-021-05947-3
Choi YJ, Nam CM, Kwak MJ. Multiple imputation techniques applied to appropriateness ratings in cataract surgery. Yonsei Med J. 2004;45(5):829–37.
pubmed: 15515193
doi: 10.3349/ymj.2004.45.5.829
Clark TG, Altman DG. Developing a prognostic model in the presence of missing data: an ovarian cancer case study. J Clin Epidemiol. 2003;56(1):28–37.
pubmed: 12589867
doi: 10.1016/S0895-4356(02)00539-5
Cleophas EP, Cleophas TJ. Clinical research: A novel approach to regression substitution for handling missing data. Am J Ther. 2013;20(5):514–9.
pubmed: 21866042
doi: 10.1097/MJT.0b013e3181ff7a7b
Curioso I, Santos R, Ribeiro B, Carreiro A, Coelho P, Fragata J, Gamboa H. Addressing the curse of missing data in clinical contexts: A novel approach to correlation-based imputation. Journal of King Saud University-Computer and Information Sciences. 2023;35(6): 101562.
doi: 10.1016/j.jksuci.2023.101562
Dekermanjian JP, Shaddox E, Nandy D, Ghosh D, Kechris K. Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics. BMC Bioinformatics. 2022;23(1):179.
pubmed: 35578165
pmcid: 9109373
doi: 10.1186/s12859-022-04659-1
DiazOrdaz K, Kenward MG, Gomes M, Grieve R. Multiple imputation methods for bivariate outcomes in cluster randomized trials. Stat Med. 2016;35(20):3482–96.
pubmed: 26990655
pmcid: 4981911
doi: 10.1002/sim.6935
Dong W, Fong DY, Yoon JS, Wan EY, Bedford LE, Tang EH, Lam CL. Generative adversarial networks for imputing missing data for big data clinical research. BMC Med Res Methodol. 2021;21:1.
doi: 10.1186/s12874-021-01272-3
Dzulkalnine MF, Sallehuddin R. Missing data imputation with fuzzy feature selection for diabetes dataset. SN Applied Sciences. 2019;1(4):362.
doi: 10.1007/s42452-019-0383-x
Ferri P, Romero-Garcia N, Badenes R, Lora-Pablos D, Morales TG, de la Cámara AG, García-Gómez JM, Sáez C. Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study. Comput Methods Programs Biomed. 2023;242: 107803.
pubmed: 37703700
doi: 10.1016/j.cmpb.2023.107803
Haliduola HN, Bretz F, Mansmann U. Missing data imputation using utility-based regression and sampling approaches. Comput Methods Programs Biomed. 2022;226: 107172.
pubmed: 36260971
doi: 10.1016/j.cmpb.2022.107172
Hassan GS, Ali NJ, Abdulsahib AK, Mohammed FJ, Gheni HM. A missing data imputation method based on the Salp swarm algorithm for diabetes disease. Bulletin of Electrical Engineering and Informatics. 2023;12(3):1700–10.
doi: 10.11591/eei.v12i3.4528
Hegde H, Shimpi N, Panny A, Glurich I, Christie P, Acharya A. MICE vs PPCA: Missing data imputation in healthcare. Inform Med Unlocked. 2019;17: 100275.
doi: 10.1016/j.imu.2019.100275
Husson F, Josse J, Narasimhan B, Robin G. Imputation of mixed data with multilevel singular value decomposition. J Comput Graph Stat. 2019;28(3):552–66.
doi: 10.1080/10618600.2019.1585261
Ilango P, Vijayakumar K, Rajasekhara BM. Instance-driven clustering for the imputation of missing data in KDD. International Journal of Communication Networks and Distributed Systems. 2014;12(1):69–81.
doi: 10.1504/IJCNDS.2014.057988
Jafrasteh B, Hernández-Lobato D, Lubián-López SP, Benavente-Fernández I. Gaussian processes for missing value imputation. Knowl-Based Syst. 2023;273: 110603.
doi: 10.1016/j.knosys.2023.110603
Jain R, Xu W. Dynamic model updating (DMU) approach for statistical learning model building with missing data. BMC Bioinformatics. 2021;22(1):221.
pubmed: 33926384
pmcid: 8086098
doi: 10.1186/s12859-021-04138-z
Jolani S. Hierarchical imputation of systematically and sporadically missing data: an approximate Bayesian approach using chained equations. Biom J. 2018;60(2):333–51.
pubmed: 28990686
doi: 10.1002/bimj.201600220
Kabir S, Farrokhvar L. Non-linear missing data imputation for healthcare data via index-aware autoencoders. Health Care Manag Sci. 2022;25(3):484–97.
pubmed: 35737282
doi: 10.1007/s10729-022-09597-1
Kim KH, Kim KJ. Missing-data handling methods for lifelong-based wellness index estimation: Comparative analysis with panel data. JMIR Med Inform. 2020;8(12): e20597.
pubmed: 33331831
pmcid: 7775200
doi: 10.2196/20597
Kuppusamy V, Paramasivam I. Integrating WLI fuzzy clustering with grey neural network for missing data imputation. International Journal of Intelligent Enterprise. 2017;4(1–2):103–27.
doi: 10.1504/IJIE.2017.087011
Kuppusamy V, Paramasivam I. Grey Fuzzy Neural Network-Based Hybrid Model for Missing Data Imputation in Mixed Database. International Journal of Intelligent Engineering & Systems. 2017; 10(2).
Lee JH, Huber JC Jr. Evaluation of multiple imputations with large proportions of missing data: how much is too much? Iran J Public Health. 2021;50(7):1372.
pubmed: 34568175
pmcid: 8426774
Ma Y, Zhang W, Lyman S, Huang Y. The HCUP SID imputation project: improving statistical inferences for health disparities research by imputing missing race data. Health Serv Res. 2018;53(3):1870–89.
pubmed: 28474359
doi: 10.1111/1475-6773.12704
Miao SD, Li SQ, Zheng XY, Wang RT, Li J, Ding SS, Ma JF. Missing data interpolation of Alzheimer’s disease based on column-by-column mixed mode. Complexity. 2021;2021:1–6.
doi: 10.1155/2021/3541516
Nadimi-Shahraki MH, Mohammadi S, Zamani H, Gandomi M, Gandomi AH. A hybrid imputation method for multi-pattern missing data: A case study on type II diabetes diagnosis. Electronics. 2021;10(24):3167.
doi: 10.3390/electronics10243167
Nijman SW, Groenhof TK, Hoogland J, Bots ML, Brandjes M, Jacobs JJ, Asselbergs FW, Moons KG, Debray TP. Real-time imputation of missing predictor values improved the application of prediction models in daily practice. J Clin Epidemiol. 2021;134:22–34.
pubmed: 33482294
doi: 10.1016/j.jclinepi.2021.01.003
Pereira RC, Abreu PH, Rodrigues PP. Partial multiple imputations with variational autoencoders: tackling not at randomness in healthcare data. IEEE J Biomed Health Inform. 2022;26(8):4218–27.
pubmed: 35511840
doi: 10.1109/JBHI.2022.3172656
Pezoulas VC, Tachos NS, Olivotto I, Barlocco F, Fotiadis DI. A “smart” Imputation Approach for Effective Quality Control across Complex Clinical Data Structures. In2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) 2022. (pp. 1049–1052). IEEE.
Phung S, Kumar A, Kim J. A deep learning technique for imputing missing healthcare data. In2019 41st annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2019. (pp. 6513–6516). IEEE.
Quartagno M, Carpenter JR. Multiple imputation for discrete data: Evaluation of the joint latent normal model. Biom J. 2019;61(4):1003–19.
pubmed: 30868652
pmcid: 6618333
doi: 10.1002/bimj.201800222
Rani P, Kumar R, Jain A. HIOC: a hybrid imputation method to predict missing values in medical datasets. International Journal of Intelligent Computing and Cybernetics. 2021;14(4):598–616.
doi: 10.1108/IJICC-03-2021-0042
Shobha K, Savarimuthu N. Clustering-based imputation algorithm using unsupervised neural network for enhancing the quality of healthcare data. J Ambient Intell Humaniz Comput. 2021;12(2):1771–81.
doi: 10.1007/s12652-020-02250-1
Sportisse A, Boyer C, Josse J. Imputation and low-rank estimation with missing not at random data. Stat Comput. 2020;30(6):1629–43.
doi: 10.1007/s11222-020-09963-5
Tomita H, Fujisawa H, Henmi M. A bias-corrected estimator in multiple imputation for missing data. Stat Med. 2018;37(23):3373–86.
pubmed: 29845646
doi: 10.1002/sim.7833
Wang G, Lu J, Choi KS, Zhang G. A transfer-based additive LS-SVM classifier for handling missing data. IEEE transactions on cybernetics. 2018;50(2):739–52.
pubmed: 30334775
doi: 10.1109/TCYB.2018.2872800
Xu D, Hu PJ, Huang TS, Fang X, Hsu CC. A deep learning–based, unsupervised method to impute missing values in electronic health records for improved patient management. J Biomed Inform. 2020;111: 103576.
pubmed: 33010424
doi: 10.1016/j.jbi.2020.103576
Xu D, Daniels MJ, Winterstein AG. Sequential BART for imputation of missing covariates. Biostatistics. 2016;17(3):589–602.
pubmed: 26980459
pmcid: 4915613
doi: 10.1093/biostatistics/kxw009
Zang H, Kim HJ, Huang B, Szczesniak R. Bayesian causal inference for observational studies with missingness in covariates and outcomes. Biometrics. 2023;79(4):3624–36.
pubmed: 37553770
doi: 10.1111/biom.13918
Yang L, Zhang H, Shen H, Huang X, Zhou X, Rong G, Shao D. Quality assessment in systematic literature reviews: A software engineering perspective. Inf Softw Technol. 2021;130: 106397.
doi: 10.1016/j.infsof.2020.106397
Alabadla M, Sidi F, Ishak I, Ibrahim H, Affendey LS, Ani ZC, Jabar MA, Bukar UA, Devaraj NK, Muda AS, Tharek A. Systematic review of using machine learning in imputing missing values. IEEE Access. 2022;10:44483–502.
doi: 10.1109/ACCESS.2022.3160841
Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A survey on missing data in machine learning. Journal of Big Data. 2021;8:1–37.
doi: 10.1186/s40537-021-00516-9
Thomas T, Rajabi E. A systematic review of machine learning-based missing value imputation techniques. Data Technologies and Applications. 2021;55(4):558–85.
doi: 10.1108/DTA-12-2020-0298
Liu M, Li S, Yuan H, Ong ME, Ning Y, Xie F, Saffari SE, Shang Y, Volovici V, Chakraborty B, Liu N. Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques. Art Intel Med. 2023:102587.
Setiawan I, Gernowo R, Warsito B. A Systematic Literature Review on Missing Values: Research Trends, Datasets, Methods, and Frameworks. In E3S Web of Conferences 2023. (Vol. 448, p. 02020). EDP Sciences.