Exploring the use of Rasch modelling in "common content" items for multi-site and multi-year assessment.

Assessment Medical licensing examination Psychometrics Rasch measurement Validity

Journal

Advances in health sciences education : theory and practice

ISSN: 1573-1677

Titre abrégé: Adv Health Sci Educ Theory Pract

Pays: Netherlands

ID NLM: 9612021

Informations de publication

Date de publication:
08 Jul 2024

Historique:

received: 25 03 2024

accepted: 30 06 2024

medline: 9 7 2024

pubmed: 9 7 2024

entrez: 8 7 2024

Statut: aheadofprint

Résumé

Rasch modelling is a powerful tool for evaluating item performance, measuring drift in difficulty over time, and comparing students who sat assessments at different times or at different sites. Here, we use data from thirty UK medical schools to describe the benefits of Rasch modelling in quality assurance and the barriers to using it. Sixty "common content" multiple choice items were offered to all UK medical schools in 2016-17, and a further sixty in 2017-18, with five available in both years. Thirty medical schools participated, for sixty total datasets across two sessions, and 14,342 individual sittings. Schools selected items to embed in written assessment near the end of their programmes. We applied Rasch modelling to evaluate unidimensionality, model fit statistics and item quality, horizontal equating to compare performance across schools, and vertical equating to compare item performance across time. Of the sixty sittings, three provided non-unidimensional data, and eight violated goodness of fit measures. Item-level statistics identified potential improvements in item construction and provided quality assurance. Horizontal equating demonstrated large differences in scores across schools, while vertical equating showed item characteristics were stable across sessions. Rasch modelling provides significant advantages in model- and item- level reporting compared to classical approaches. However, the complexity of the analysis and the smaller number of educators familiar with Rasch must be addressed locally for a programme to benefit. Furthermore, due to the comparative novelty of Rasch modelling, there is greater ambiguity on how to proceed when a Rasch model identifies misfitting or problematic data.

Identifiants

DOI: 10.1007/s10459-024-10354-y PMID: 38977526

pubmed: 38977526

doi: 10.1007/s10459-024-10354-y

pii: 10.1007/s10459-024-10354-y

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Allawi, L., Ali, S., Hassan, F., & Sohrabi, F. (2016). UKMLA: American dream or nightmare? Medical Teacher, 38(3), 320. https://doi.org/10.3109/0142159X.2015.1105948

doi: 10.3109/0142159X.2015.1105948

Andrich, D. (2004). Controversy and the Rasch Model: A characteristic of Incompatible paradigms? Medical Care, 42(1), I7–I16.

doi: 10.1097/01.mlr.0000103528.48582.7c

Bock, D. (1997). A brief history of item theory. Educ Meas, 16, 21–33.

doi: 10.1111/j.1745-3992.1997.tb00605.x

Boursicot, K., Roberts, T. E., & Pell, G. (2006). Standard setting for clinical competence at graduation from medical school: A comparison of passing scores across five medical schools. Advances in Health Sciences Education, 11(2), 173–183.

doi: 10.1007/s10459-005-5291-8

Brannick, M. T., Erol-Korkmaz, H. T., & Prewett, M. (2011). A systematic review of the reliability of objective structured clinical examination scores. Medical Education, 45, 1181–1189. https://doi.org/10.1111/j.1365-2923.2011.04075.x

doi: 10.1111/j.1365-2923.2011.04075.x

Chen, B., Azad, S., Fowler, M., West, M., & Zilles, C. (2020). Learning to Cheat: Quantifying Changes in Score Advantage of Unproctored Assessments Over Time. Proceedings of the Seventh ACM Conference on Learning @ Scale, 197–206. https://doi.org/10.1145/3386527.3405925

Cox, M., Irby, D. M., & Epstein, R. M. (2007). Assessment in medical education. New England Journal of Medicine, 356, 387–396.

doi: 10.1056/NEJMra054784

Crawford, A. V., Green, S. B., Levy, R., Lo, W. J., Scott, L., Svetina, D., & Thompson, M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors. Educational and Psychological Measurement, 70, 885–901. https://doi.org/10.1177/0013164410379332

doi: 10.1177/0013164410379332

Cross the Line Practical Assessment, Research & Evaluation, 26, 11. https://doi.org/10.7275/v2gd-4441

Cuddy, M. M., Young, A., Gelman, A., Swanson, D. B., Johnson, D. A., Dillon, G. F., & Clauser, B. E. (2017). Exploring the relationships between USMLE performance and disciplinary action in practice: A validity study of score inferences from a licensure examination. Academic Medicine, 92(12), 1780–1785.

doi: 10.1097/ACM.0000000000001747

De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44(1), 109–117.

doi: 10.1111/j.1365-2923.2009.03425.x

General Medical Council. (2018). Outcomes for graduates. General Medical Council.

Glas, C. A., & Verhelst, N. D. (1995). Testing the Rasch model. Rasch models (pp. 69–95). Springer.

Homer, M. (2021). Re-conceptualising and accounting for examiner (cut-score) stringency in a ‘high frequency, small cohort’ performance test. Advances in Health Sciences Education, 26(2), 369–383. https://doi.org/10.1007/s10459-020-09990-x

doi: 10.1007/s10459-020-09990-x

Homer, M., & Darling, J. C. (2016). Setting standards in knowledge assessments: Comparing Ebel and Cohen via Rasch. Medical Teacher, 38(12), 1267–1277. https://doi.org/10.1080/0142159X.2016.1230184

doi: 10.1080/0142159X.2016.1230184

Hope, D., Adamson, K., McManus, I. C., Chis, L., & Elder, A. T. (2018). Using Differential Item Functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment. BMC Medical Education, 18(1), 64.

doi: 10.1186/s12909-018-1143-0

Hope, D., Kluth, D., Homer, M., Dewar, A., Fuller, R., & Cameron, H. (2021). Variation in performance on common content items at UK medical schools. BMC Medical Education, 21(1), 1–9.

doi: 10.1186/s12909-021-02761-1

Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 179–193.

McManus, I. C., Elder, A., de Champlain, A., Dacre, J., Mollon, J., & Chis, L. (2008). Graduates of different UK medical schools show substantial differences in performance on MRCP(UK) Part 1, part 2 and PACES examinations. BMC Medicine, 6, 5.

doi: 10.1186/1741-7015-6-5

McManus, I. C., Chis, L., Fox, R., Waller, D., & Tang, P. (2014). Implementing statistical equating for MRCP (UK) parts 1 and 2. BMC Medical Education, 14(1), 204.

doi: 10.1186/1472-6920-14-204

Norcini, J. J. J. (1999). Standards and reliability in evaluation: When rules of thumb don’t apply. Academic Medicine, 74, 1088–1090.

doi: 10.1097/00001888-199910000-00010

Norcini, J. J., Boulet, J. R., Opalek, A., & Dauphinee, W. D. (2014). The relationship between licensing examination performance and the outcomes of care by international medical school graduates. Academic Medicine, 89(8), 1157–1162.

doi: 10.1097/ACM.0000000000000310

Pell, G., Fuller, R., Homer, M., & Roberts, T. (2010). How to measure the quality of the OSCE: A review of metrics—AMEE guide 49. Medical Teacher, 32, 802–811. https://doi.org/10.3109/0142159X.2010.507716

doi: 10.3109/0142159X.2010.507716

Pell, G., Fuller, R., Homer, M., & Roberts, T. (2013). Advancing the objective structured clinical examination: Sequential testing in theory and practice. Medical Education, 47, 569–577. https://doi.org/10.1111/medu.12136

doi: 10.1111/medu.12136

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Institute of Education Research.

Ricker, K. L. (2006). Setting cut-scores: A critical review of the Angoff and modified Angoff methods. Alberta Journal of Educational Research, 52, 53–64.

Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of Test equating methods with a special focus on IRT-Based approaches. Statistica, 77(4), 329–352. https://doi.org/10.6092/issn.1973-2201/7066 . http://dx.doi.org.ezproxy.is.ed.ac.uk/

doi: 10.6092/issn.1973-2201/7066

Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2006). A plea for new psychometric models in educational assessment. Medical Education, 40, 296–300. https://doi.org/10.1111/j.1365-2929.2006.02405.x

doi: 10.1111/j.1365-2929.2006.02405.x

Schuwirth, L., Bosman, G., Henning, R. H., Rinkel, R., & Wenink, A. C. G. (2010). Collaboration on progress testing in medical schools in the Netherlands. Medical Teacher, 32(6), 476–479. https://doi.org/10.3109/0142159X.2010.485658

doi: 10.3109/0142159X.2010.485658

Stemler, S. E., & Naples, A. (2021). Rasch Measurement v. Knowing When to.

Tavakol, M., & Dennick, R. (2012). Post-examination interpretation of objective test data: Monitoring and improving the quality of high–stakes examinations– a commentary on two AMEE guides. Medical Teacher, 34, 245–248. https://doi.org/10.3109/0142159X.2012.643266

doi: 10.3109/0142159X.2012.643266

Tavakol, M., & Dennick, R. (2013). Psychometric evaluation of a knowledge based examination using Rasch analysis: An illustrative guide: AMEE Guide 72. Medical Teacher, 35(1), e838–e848. https://doi.org/10.3109/0142159X.2012.737488

doi: 10.3109/0142159X.2012.737488

Taylor, C. A., Gurnell, M., Melville, C. R., Kluth, D. C., Johnson, N., & Wass, V. (2017). Variation in passing standards for graduation-level knowledge items at UK medical schools. Medical Education, 51(6), 612–620.

doi: 10.1111/medu.13240

Van der Linden, W. J. (Ed.). (2016). Handbook of item response theory: Volume 1: Models. CRC.

Wang, W. C., & Yeh, Y. L. (2003). Effects of Anchor Item methods on Differential Item Functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27(6), 479–498.

doi: 10.1177/0146621603259902

Wrigley, W., Van Der Vleuten, C. P., Freeman, A., & Muijtjens, A. (2012). A systemic framework for the progress test: Strengths, constraints and issues: AMEE Guide 71. Medical Teacher, 34(9), 683–697. https://doi.org/10.3109/0142159X.2012.704437

doi: 10.3109/0142159X.2012.704437

Yeates, P., Cope, N., Luksaite, E., Hassell, A., & Dikomitis, L. (2019). Exploring differences in individual and group judgements in standard setting. Medical Education, 53(9), 941–952.

doi: 10.1111/medu.13915

Exploring the use of Rasch modelling in "common content" items for multi-site and multi-year assessment.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

David Hope (D)

David Kluth (D)

Matthew Homer (M)

Avril Dewar (A)

Rikki Goddard-Fuller (R)

Alan Jaap (A)

Helen Cameron (H)

Classifications MeSH