Accounting for careless and insufficient effort responding in large-scale survey data-development, evaluation, and application of a screen-time-based weighting procedure.

Careless responding Data screening Finite mixture modeling Item response theory Maximum pseudo-likelihood estimation Screen times

Journal

Behavior research methods

ISSN: 1554-3528

Titre abrégé: Behav Res Methods

Pays: United States

ID NLM: 101244316

Informations de publication

Date de publication:
03 Mar 2023

Historique:

accepted: 09 12 2022

entrez: 3 3 2023

pubmed: 4 3 2023

medline: 4 3 2023

Statut: aheadofprint

Résumé

Careless and insufficient effort responding (C/IER) poses a major threat to the quality of large-scale survey data. Traditional indicator-based procedures for its detection are limited in that they are only sensitive to specific types of C/IER behavior, such as straight lining or rapid responding, rely on arbitrary threshold settings, and do not allow taking the uncertainty of C/IER classification into account. Overcoming these limitations, we develop a two-step screen-time-based weighting procedure for computer-administered surveys. The procedure allows considering the uncertainty in C/IER identification, is agnostic towards the specific types of C/IE response patterns, and can feasibly be integrated with common analysis workflows for large-scale survey data. In Step 1, we draw on mixture modeling to identify subcomponents of log screen time distributions presumably stemming from C/IER. In Step 2, the analysis model of choice is applied to item response data, with respondents' posterior class probabilities being employed to downweigh response patterns according to their probability of stemming from C/IER. We illustrate the approach on a sample of more than 400,000 respondents being administered 48 scales of the PISA 2018 background questionnaire. We gather supporting validity evidence by investigating relationships between C/IER proportions and screen characteristics that entail higher cognitive burden, such as screen position and text length, relating identified C/IER proportions to other indicators of C/IER as well as by investigating rank-order consistency in C/IER behavior across screens. Finally, in a re-analysis of the PISA 2018 background questionnaire data, we investigate the impact of the C/IER adjustments on country-level comparisons.

Identifiants

DOI: 10.3758/s13428-022-02053-6 PMID: 36867339

pubmed: 36867339

doi: 10.3758/s13428-022-02053-6

pii: 10.3758/s13428-022-02053-6

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Arias, V. B., Garrido, L., Jenaro, C., Martinez-Molina, A., & Arias, B. (2020). A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data. Behavior Research Methods, 52, 2489–2505. https://doi.org/10.3758/s13428-020-01401-8

doi: 10.3758/s13428-020-01401-8 pubmed: 32462604

Bauer, D. J., & Curran, P. J. (2003). Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods, 8(3), 338–363. https://doi.org/10.1037/1082-989X.8.3.338

doi: 10.1037/1082-989X.8.3.338 pubmed: 14596495

Boe, E. E., May, H., & Boruch, R. F. (2002). Student Task Persistence in the Third International Mathematics and Science Study: A Major Source of Achievement Differences at the National, Classroom, and Student Levels. Pennsylvania Univ., Philadelphia. Center for Research and Evaluation in Social Policy.

Bowling, N. A., Gibson, A. M., Houpt, J. W., & Brower, C. K. (2020). Will the questions ever end? Person-level increases in careless responding during questionnaire completion. Organizational Research Methods 1–21. https://doi.org/10.1177/1094428120947794 .

Bowling, N. A., Huang, J. L., Bragg, C. B., Khazon, S., Liu, M., & Blackmore, C. E. (2016). Who cares and who is careless? Insufficient effort responding as a reflection of respondent personality. Journal of Personality and Social Psychology, 111(2), 218.

doi: 10.1037/pspp0000085 pubmed: 26927958

Bowling, N. A., Huang, J. L., Brower, C. K., & Bragg, C. B. (2021). The quick and the careless: The construct validity of page time as a measure of insufficient effort responding to surveys. Organizational Research Methods. https://doi.org/10.1177/10944281211056520 .

Bradburn, N. (1978). Respondent burden. In Proceedings of the Survey Research Methods Section of the American Statistical Association, (Vol. 35 pp. 35–40). VA: American Statistical Association Alexandria.

Brower, C. K. (2018). Too long and too boring: The effects of survey length and interest on careless responding (Master’s thesis, Wright State University).

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., ..., Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software 76(1). https://doi.org/10.18637/jss.v076.i01 .

Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06 .

doi: 10.18637/jss.v048.i06

Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006

doi: 10.1016/j.jesp.2015.07.006

DeSimone, J. A., DeSimone, A. J., Harms, P., & Wood, D. (2018). The differential impacts of two forms of insufficient effort responding. Applied Psychology, 67(2), 309–338. https://doi.org/10.1111/apps.12117

doi: 10.1111/apps.12117

Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38 (1), 67–86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x

doi: 10.1111/j.2044-8317.1985.tb00817.x

Ferrando, P. J., & Lorenzo-Seva, U. (2007). An item response theory model for incorporating response time data in binary personality items. Applied Psychological Measurement, 31(6), 525–543. https://doi.org/10.1177/0146621606295197

doi: 10.1177/0146621606295197

Frey, A., Spoden, C., Goldhammer, F., & Wenzel, S. F. C. (2018). Response time-based treatment of omitted responses in computer-based testing. Behaviormetrika, 45(2), 505–526. https://doi.org/10.1007/s41237-018-0073-9

doi: 10.1007/s41237-018-0073-9

Galesic, M., & Bosnjak, M. (2009). Effects of questionnaire length on participation and indicators of response quality in a web survey. Public opinion quarterly, 73(2), 349–360. https://doi.org/10.1093/poq/nfp031

doi: 10.1093/poq/nfp031

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457–472. https://doi.org/10.1214/ss/1177011136

doi: 10.1214/ss/1177011136

Gelman, A., & Shirley, K. (2011). Inference from simulations and monitoring convergence. In S. Brooks, A. Gelman, G. Jones, & X.-L. Meng (Eds.) Handbook of Markov Chain Monte Carlo (pp. 163–174). Boca Raton: Chapman Hall.

Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29(3), 173–183. https://doi.org/10.1080/08957347.2016.1171766

doi: 10.1080/08957347.2016.1171766

Guo, J., Gabry, J., & Goodrich, B. (2018). rstan: R interface to Stan. package version 2.18.2. Retrieved from https://CRAN.R-project.org/package=rstan .

Hong, M. R., & Cheng, Y. (2019). Robust maximum marginal likelihood (RMML) estimation for item response theory models. Behavior Research Methods, 51(2), 573–588. https://doi.org/10.3758/s13428-018-1150-4

doi: 10.3758/s13428-018-1150-4 pubmed: 30350024

Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8

doi: 10.1007/s10869-011-9231-8

Huang, J. L., Liu, M., & Bowling, N. A. (2015). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100(3), 828–845. https://doi.org/10.1037/a0038510

doi: 10.1037/a0038510 pubmed: 25495093

Johnson, J. A. (2005). Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality, 39(1), 103–129. https://doi.org/10.1016/j.jrp.2004.09.009

doi: 10.1016/j.jrp.2004.09.009

Kam, C. C. S., & Meyer, J. P. (2015). How careless responding and acquiescence response bias can influence construct dimensionality: The case of job satisfaction. Organizational Research Methods, 18(3), 512–541. https://doi.org/10.1177/1094428115571894

doi: 10.1177/1094428115571894

Kim, Y., Dykema, J., Stevenson, J., Black, P., & Moberg, D. P. (2018). Straightlining: Overview of measurement, comparison of indicators, and effects in mail–web mixed-mode surveys. Social Science Computer Review, 37(2), 214–233. https://doi.org/10.1177/0894439317752406

doi: 10.1177/0894439317752406

Knowles, E., Cook, D., & Neville, J. (1989). Modifiers of context effects on personality tests: Verbal ability and need for cognition. In Annual Meeting of the American Psychological Society, Alexandria, VA.

Kroehne, U., Buchholz, J., & Goldhammer, F. (2019). Detecting carelessly invalid responses in item sets using item-level response times. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Toronto, Canada.

Kroehne, U., & Goldhammer, F. (2018). How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika, 45(2), 527–563. https://doi.org/10.1007/s41237-018-0063-y

doi: 10.1007/s41237-018-0063-y

Kuncel, R. B., & Fiske, D. W. (1974). Stability of response process and response. Educational and Psychological Measurement, 34(4), 743–755. https://doi.org/10.1177/00131644740.3400401

doi: 10.1177/00131644740.3400401

Lee, Y.-H., & Jia, Y. (2014). Using response time to investigate students’ test-taking behaviors in a NAEP computer-based study. Large-scale Assessments in Education 2(1). https://doi.org/10.1186/s40536-014-0008-1 .

Liu, Y., Li, Z., Liu, H., & Luo, F. (2019). Modeling test-taking non-effort in MIRT models. Frontiers in Psychology 10. https://doi.org/10.3389/fpsyg.2019.00145 .

Magraw-Mickelson, Z., Wang, H., & Gollwitzer, M. (2020). Survey mode and data quality: Careless responding across three modes in cross-cultural contexts. International Journal of Testing, 22(2), 121–153. https://doi.org/10.1080/15305058.2021.2019747

doi: 10.1080/15305058.2021.2019747

Maniaci, M. R., & Rogge, R. D. (2014). Caring about carelessness: Participant inattention and its effects on research. Journal of Research in Personality, 48, 61–83. https://doi.org/10.1016/j.jrp.2013.09.008

doi: 10.1016/j.jrp.2013.09.008

McKay, A. S., Garcia, D. M., Clapper, J. P., & Shultz, K. S. (2018). The attentive and the careless: Examining the relationship between benevolent and malevolent personality traits with careless responding in online surveys. Computers in Human Behavior, 84, 295–303. https://doi.org/10.1016/j.chb.2018.03.007

doi: 10.1016/j.chb.2018.03.007

McLachlan, G. J., & Peel, D. (2004) Finite mixture models. New York: Wiley.

Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085

doi: 10.1037/a0028085 pubmed: 22506584

Molenaar, D., Bolsinova, M., & Vermunt, J. K. (2018). A semi-parametric within-subject mixture approach to the analyses of responses and response times. British Journal of Mathematical and Statistical Psychology, 71(2), 205–228. https://doi.org/10.1111/bmsp.12117

doi: 10.1111/bmsp.12117 pubmed: 29044460

Molenaar, D., Tuerlinckx, F., Maas, H. L., & van der Maas, V.D. (2015). A bivariate generalized linear item response theory modeling framework to the analysis of responses and response times. Multivariate Behavioral Research, 50(1), 56–74. https://doi.org/10.1080/00273171.2014.962684

doi: 10.1080/00273171.2014.962684 pubmed: 26609743

Muraki, E. (1997). A generalized partial credit model. In Handbook of modern item response theory (pp. 153–164). Berlin: Springer.

Nagy, G., & Ulitzsch, E. (2021). A multilevel mixture IRT framework for modeling response times as predictors or indicators of response engagement in IRT models. Educational and Psychological Measurement. https://doi.org/10.1177/00131644211045351 .

National Center for Education Statistics (2009). NAEP technical documentation. National Center for Education Statistics. Retrieved from. https://nces.ed.gov/nationsreportcard/tdw/ .

Nichols, A. L., & Edlund, J. E. (2020). Why don’t we care more about carelessness? Understanding the causes and consequences of careless participants. International Journal of Social Research Methodology, 23(6), 625–638. https://doi.org/10.1080/13645579.2020.1719618

doi: 10.1080/13645579.2020.1719618

Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use? Journal of Research in Personality, 63, 1–11. https://doi.org/10.1016/j.jrp.2016.04.010

doi: 10.1016/j.jrp.2016.04.010

OECD (2013). Technical report of the Survey of Adult Skills (PIAAC). Organisation for Economic Co-operation and Development. Paris, France. Retrieved from https://www.oecd.org/skills/piaac/_Technical%20Report_17OCT13.pdf .

OECD (2020). PISA 2018 technical report. OECD Publishing. Paris, France. Retrieved from https://www.oecd.org/pisa/data/pisa2018technicalreport/ .

Patton, J. M., Cheng, Y., Hong, M., & Diao, Q. (2019). Detection and treatment of careless responses to improve item parameter estimation. Journal of Educational and Behavioral Statistics, 44(3), 309–341. https://doi.org/10.3102/1076998618825116

doi: 10.3102/1076998618825116

Pokropek, A. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300–325. https://doi.org/10.3102/1076998616636618

doi: 10.3102/1076998616636618

R Development Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Retrieved from http://www.R-project.org .

Rios, J. A., & Guo, H. (2020). Can culture be a salient predictor of test-taking engagement? An analysis of differential noneffortful responding on an international college-level assessment of critical thinking. Applied Measurement in Education, 33(4), 263–279. https://doi.org/10.1080/08957347.2020.1789141

doi: 10.1080/08957347.2020.1789141

Rios, J. A., Guo, H., Mao, L., & Liu, O. L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 74–104. https://doi.org/10.1080/15305058.2016.1231193

doi: 10.1080/15305058.2016.1231193

Samejima, F. (2016). Graded response models. In Handbook of item response theory (pp. 123–136): Chapman Hall/CRC.

Schmitt, N., & Stuits, D. M. (1985). Factors defined by negatively keyed items: The result of careless respondents? Applied Psychological Measurement, 9(4), 367–373. https://doi.org/10.1177/014662168500900405

doi: 10.1177/014662168500900405

Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289–317. https://doi.org/10.32614/RJ-2016-021

doi: 10.32614/RJ-2016-021 pubmed: 27818791 pmcid: 5096736

Sinharay, S. (2016). Asymptotically correct standardization of person-fit statistics beyond dichotomous items. Psychometrika, 81(4), 992–1013.

doi: 10.1007/s11336-015-9465-x pubmed: 25953476

Soland, J., Kuhfeld, M., & Rios, J. (2021). Comparing different response time threshold setting methods to detect low effort on a large-scale assessment. Large-Scale Assessments in Education, 9(1), 1–21. https://doi.org/10.1186/s40536-021-00100-w

doi: 10.1186/s40536-021-00100-w

Soland, J., Wise, S. L., & Gao, L. (2019). Identifying disengaged survey responses: New evidence using response time metadata. Applied Measurement in Education, 32(2), 151–165. https://doi.org/10.1080/08957347.2019.1577244

doi: 10.1080/08957347.2019.1577244

Srebro, N., Shakhnarovich, G., & Roweis, S. (2006). An investigation of computational and informational limits in Gaussian mixture clustering. In Proceedings of the 23rd international conference on Machine learning (pp. 865–872), DOI https://doi.org/10.1145/1143844.1143953 , (to appear in print).

Steinmann, I., Strietholt, R., & Braeken, J. (2022). A constrained factor mixture analysis model for consistent and inconsistent respondents to mixed-worded scales. Psychological Methods, 27(2), 667–702. https://doi.org/10.1037/met0000392

doi: 10.1037/met0000392 pubmed: 33829811

Thomas, D. R., & Cyr, A. (2002). Applying item response theory methods to complex survey data. In Proceedings of the Survey Methods Section (pp. 17–25).

Ulitzsch, E., Penk, C., von Davier, M., & Pohl, S. (2021). Modell meets reality: Validating a new behavioral measure for test-taking effort. Educational Assessment. https://doi.org/10.1080/10627197.2020.1858786 .

Ulitzsch, E., Pohl, S., Khorramdel, L., Kroehne, U., & von Davier, M. (2021). A response-time-based latent response mixture model for identifying and modeling careless and insufficient effort responding in survey data. Psychometrika. https://doi.org/10.1007/s11336-021-09817-7 .

Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level nonresponse. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12188 .

Ulitzsch, E., Yildirim-Erbasli, S. N., Gorgun, G., & Bulut, O. (2022). An explanatory mixture IRT model for careless and insufficient effort responding in survey data. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12272 .

van Laar, S., & Braeken, J. (2022). Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity? Journal of Educational Measurement. https://doi.org/10.1111/jedm.12317 .

Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477. https://doi.org/10.1111/bmsp.12054

doi: 10.1111/bmsp.12054 pubmed: 25873487

Wise, S. L. (2019). An information-based approach to identifying rapid-guessing thresholds. Applied Measurement in Education, 32(4), 325–336. https://doi.org/10.1080/08957347.2019.1660350

doi: 10.1080/08957347.2019.1660350

Wise, S. L. (2017). Rapid-Guessing Behavior: Its Identification, Interpretation, and Implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165

doi: 10.1111/emip.12165

Wise, S. L., & DeMars, C. E. (2006). An Application of Item Response Time: The Effort-Moderated IRT Model. Journal of Educational Measurement, 43(1), 19–38. https://doi.org/10.1111/j.1745-3984.2006.00002.x

doi: 10.1111/j.1745-3984.2006.00002.x

Wise, S. L., & Gao, L. (2017). A general approach to measuring test-taking effort on computer-based tests. Applied Measurement in Education, 30(4), 343–354. https://doi.org/10.1080/08957347.2017.1353992

doi: 10.1080/08957347.2017.1353992

Wise, S. L., & Kuhfeld, M. R. (2021). Using retest data to evaluate and improve effort-moderated scoring. Journal of Educational Measurement, 58(1), 130–149. https://doi.org/10.1111/jedm.12275

doi: 10.1111/jedm.12275

Wise, S. L., & Ma, L. (2012). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Vancouver, Canada.

Woods, C. M. (2006). Careless responding to reverse-worded items: Implications for confirmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28(3), 189–94. https://doi.org/10.1007/s10862-005-9004-7

doi: 10.1007/s10862-005-9004-7

Yildirim-Erbasli, S. N., & Bulut, O. (2021). The impact of students’ test-taking effort on growth estimates in low-stakes educational assessments. Educational Research and Evaluation. https://doi.org/10.1080/13803611.2021.1977152 .

Accounting for careless and insufficient effort responding in large-scale survey data-development, evaluation, and application of a screen-time-based weighting procedure.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Esther Ulitzsch (E)

Hyo Jeong Shin (HJ)

Oliver Lüdtke (O)

Classifications MeSH