Evaluating Different Equating Setups in the Continuous Item Pool Calibration for Computerized Adaptive Testing.

computerized adaptive test continuous calibration equating item response theory simulation

Journal

Frontiers in psychology
ISSN: 1664-1078
Titre abrégé: Front Psychol
Pays: Switzerland
ID NLM: 101550902

Informations de publication

Date de publication:
2019
Historique:
received: 23 11 2018
accepted: 15 05 2019
entrez: 28 6 2019
pubmed: 28 6 2019
medline: 28 6 2019
Statut: epublish

Résumé

The increasing digitalization in the field of psychological and educational testing opens up new opportunities to innovate assessments in many respects (e.g., new item formats, flexible test assembly, efficient data handling). In particular, computerized adaptive testing provides the opportunity to make tests more individualized and more efficient. The newly developed continuous calibration strategy (CCS) from Fink et al. (2018) makes it possible to construct computerized adaptive tests in application areas where separate calibration studies are not feasible. Due to the goal of reporting on a common metric across test cycles, the equating is crucial for the CCS. The quality of the equating depends on the common items selected and the scale transformation method applied. Given the novelty of the CCS, the aim of the study was to evaluate different equating setups in the CCS and to derive practical recommendations. The impact of different equating setups on the precision of item parameter estimates and on the quality of the equating was examined in a Monte Carlo simulation, based on a fully crossed design with the factors common item difficulty distribution (bimodal, normal, uniform), scale transformation method (mean/mean, mean/sigma, Haebara, Stocking-Lord), and sample size per test cycle (50, 100, 300). The quality of the equating was operationalized by three criteria (proportion of feasible equatings, proportion of drifted items, and error of transformation constants). The precision of the item parameter estimates increased with increasing sample size per test cycle, but no substantial difference was found with respect to the common item difficulty distribution and the scale transformation method. With regard to the feasibility of the equatings, no differences were found for the different scale transformation methods. However, when using the moment methods (mean/mean, mean/sigma), quite extreme levels of error for the transformation constants

Identifiants

pubmed: 31244717
doi: 10.3389/fpsyg.2019.01277
pmc: PMC6563622
doi:

Types de publication

Journal Article

Langues

eng

Pagination

1277

Références

Br J Math Stat Psychol. 2009 May;62(Pt 2):369-83
pubmed: 18534047
Educ Psychol Meas. 2017 Apr;77(2):241-262
pubmed: 29795912

Auteurs

Sebastian Born (S)

Department of Research Methods in Education, Institute of Educational Science, Friedrich Schiller University Jena, Jena, Germany.

Aron Fink (A)

Educational Psychology: Measurement, Evaluation and Counseling, Institute of Psychology, Goethe University Frankfurt, Frankfurt, Germany.

Christian Spoden (C)

German Institute for Adult Education, Leibniz Centre for Lifelong Learning, Bonn, Germany.

Andreas Frey (A)

Educational Psychology: Measurement, Evaluation and Counseling, Institute of Psychology, Goethe University Frankfurt, Frankfurt, Germany.
Faculty of Educational Sciences, Centre for Educational Measurement, University of Oslo, Oslo, Norway.

Classifications MeSH