Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data.

Humans Brassicaceae Extraversion, Psychological Psychometrics Research Design

Journal

Scientific reports

ISSN: 2045-2322

Titre abrégé: Sci Rep

Pays: England

ID NLM: 101563288

Informations de publication

Date de publication:
17 08 2023

Historique:

received: 26 01 2023

accepted: 07 08 2023

medline: 21 8 2023

pubmed: 18 8 2023

entrez: 17 8 2023

Statut: epublish

Résumé

Prior to further processing, completed questionnaires must be screened for the presence of careless respondents. Different people will respond to surveys in different ways. Some take the easy path and fill out the survey carelessly. The proportion of careless respondents determines the survey's quality. As a result, identifying careless respondents is critical for the quality of obtained results. This study aims to explore the characteristics of careless respondents in survey data and evaluate the predictive power and interpretability of different types of data and indices of careless responding. The research question focuses on understanding the behavior of careless respondents and determining the effectiveness of various data sources in predicting their responses. Data from a three-month web-based survey on participants' personality traits such as honesty-humility, emotionality, extraversion, agreeableness, conscientiousness and openness to experience was used in this study. Data for this study was taken from Schroeders et al.. The gradient boosting machine-based prediction model uses data from the answers, time spent for answering, demographic information on the respondents as well as some indices of careless responding from all three types of data. Prediction models were evaluated with tenfold cross-validation repeated a hundred times. Prediction models were compared based on balanced accuracy. Models' explanations were provided with Shapley values. Compared with existing work, data fusion from multiple types of information had no noticeable effect on the performance of the gradient boosting machine model. Variables such as "I would never take a bribe, even if it was a lot", average longstring, and total intra-individual response variability were found to be useful in distinguishing careless respondents. However, variables like "I would be tempted to use counterfeit money if I could get away with it" and intra-individual response variability of the first section of a survey showed limited effectiveness. Additionally, this study indicated that, whereas the psychometric synonym score has an immediate effect and is designed with the goal of identifying careless respondents when combined with other variables, it is not necessarily the optimal choice for fitting a gradient boosting machine model.

Identifiants

DOI: 10.1038/s41598-023-40209-2 PMID: 37591974 PMC: PMC10435557

pubmed: 37591974

doi: 10.1038/s41598-023-40209-2

pii: 10.1038/s41598-023-40209-2

pmc: PMC10435557

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

13417

Informations de copyright

Références

Annu Rev Psychol. 2023 Jan 18;74:577-596

pubmed: 35973734

Br J Math Stat Psychol. 2022 Nov;75(3):668-698

pubmed: 35730351

Behav Res Methods. 2023 Oct;55(7):3370-3415

pubmed: 36131197

Psychometrika. 2022 Jun;87(2):593-619

pubmed: 34855118

Res Nurs Health. 2019 Dec;42(6):494-499

pubmed: 31612519

Educ Psychol Meas. 2022 Feb;82(1):29-56

pubmed: 34992306

Behav Res Methods. 2020 Dec;52(6):2489-2505

pubmed: 32462604

Radiology. 1982 Apr;143(1):29-36

pubmed: 7063747

Psychol Methods. 2012 Sep;17(3):437-55

pubmed: 22506584

Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Leon Kopitar (L)

Gregor Stiglic (G)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH