Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
17 08 2023
17 08 2023
Historique:
received:
26
01
2023
accepted:
07
08
2023
medline:
21
8
2023
pubmed:
18
8
2023
entrez:
17
8
2023
Statut:
epublish
Résumé
Prior to further processing, completed questionnaires must be screened for the presence of careless respondents. Different people will respond to surveys in different ways. Some take the easy path and fill out the survey carelessly. The proportion of careless respondents determines the survey's quality. As a result, identifying careless respondents is critical for the quality of obtained results. This study aims to explore the characteristics of careless respondents in survey data and evaluate the predictive power and interpretability of different types of data and indices of careless responding. The research question focuses on understanding the behavior of careless respondents and determining the effectiveness of various data sources in predicting their responses. Data from a three-month web-based survey on participants' personality traits such as honesty-humility, emotionality, extraversion, agreeableness, conscientiousness and openness to experience was used in this study. Data for this study was taken from Schroeders et al.. The gradient boosting machine-based prediction model uses data from the answers, time spent for answering, demographic information on the respondents as well as some indices of careless responding from all three types of data. Prediction models were evaluated with tenfold cross-validation repeated a hundred times. Prediction models were compared based on balanced accuracy. Models' explanations were provided with Shapley values. Compared with existing work, data fusion from multiple types of information had no noticeable effect on the performance of the gradient boosting machine model. Variables such as "I would never take a bribe, even if it was a lot", average longstring, and total intra-individual response variability were found to be useful in distinguishing careless respondents. However, variables like "I would be tempted to use counterfeit money if I could get away with it" and intra-individual response variability of the first section of a survey showed limited effectiveness. Additionally, this study indicated that, whereas the psychometric synonym score has an immediate effect and is designed with the goal of identifying careless respondents when combined with other variables, it is not necessarily the optimal choice for fitting a gradient boosting machine model.
Identifiants
pubmed: 37591974
doi: 10.1038/s41598-023-40209-2
pii: 10.1038/s41598-023-40209-2
pmc: PMC10435557
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
13417Informations de copyright
© 2023. Springer Nature Limited.
Références
Annu Rev Psychol. 2023 Jan 18;74:577-596
pubmed: 35973734
Br J Math Stat Psychol. 2022 Nov;75(3):668-698
pubmed: 35730351
Behav Res Methods. 2023 Oct;55(7):3370-3415
pubmed: 36131197
Psychometrika. 2022 Jun;87(2):593-619
pubmed: 34855118
Res Nurs Health. 2019 Dec;42(6):494-499
pubmed: 31612519
Educ Psychol Meas. 2022 Feb;82(1):29-56
pubmed: 34992306
Behav Res Methods. 2020 Dec;52(6):2489-2505
pubmed: 32462604
Radiology. 1982 Apr;143(1):29-36
pubmed: 7063747
Psychol Methods. 2012 Sep;17(3):437-55
pubmed: 22506584