Exploring the use of Rasch modelling in "common content" items for multi-site and multi-year assessment.
Assessment
Medical licensing examination
Psychometrics
Rasch measurement
Validity
Journal
Advances in health sciences education : theory and practice
ISSN: 1573-1677
Titre abrégé: Adv Health Sci Educ Theory Pract
Pays: Netherlands
ID NLM: 9612021
Informations de publication
Date de publication:
08 Jul 2024
08 Jul 2024
Historique:
received:
25
03
2024
accepted:
30
06
2024
medline:
9
7
2024
pubmed:
9
7
2024
entrez:
8
7
2024
Statut:
aheadofprint
Résumé
Rasch modelling is a powerful tool for evaluating item performance, measuring drift in difficulty over time, and comparing students who sat assessments at different times or at different sites. Here, we use data from thirty UK medical schools to describe the benefits of Rasch modelling in quality assurance and the barriers to using it. Sixty "common content" multiple choice items were offered to all UK medical schools in 2016-17, and a further sixty in 2017-18, with five available in both years. Thirty medical schools participated, for sixty total datasets across two sessions, and 14,342 individual sittings. Schools selected items to embed in written assessment near the end of their programmes. We applied Rasch modelling to evaluate unidimensionality, model fit statistics and item quality, horizontal equating to compare performance across schools, and vertical equating to compare item performance across time. Of the sixty sittings, three provided non-unidimensional data, and eight violated goodness of fit measures. Item-level statistics identified potential improvements in item construction and provided quality assurance. Horizontal equating demonstrated large differences in scores across schools, while vertical equating showed item characteristics were stable across sessions. Rasch modelling provides significant advantages in model- and item- level reporting compared to classical approaches. However, the complexity of the analysis and the smaller number of educators familiar with Rasch must be addressed locally for a programme to benefit. Furthermore, due to the comparative novelty of Rasch modelling, there is greater ambiguity on how to proceed when a Rasch model identifies misfitting or problematic data.
Identifiants
pubmed: 38977526
doi: 10.1007/s10459-024-10354-y
pii: 10.1007/s10459-024-10354-y
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2024. The Author(s).
Références
Allawi, L., Ali, S., Hassan, F., & Sohrabi, F. (2016). UKMLA: American dream or nightmare? Medical Teacher, 38(3), 320. https://doi.org/10.3109/0142159X.2015.1105948
doi: 10.3109/0142159X.2015.1105948
Andrich, D. (2004). Controversy and the Rasch Model: A characteristic of Incompatible paradigms? Medical Care, 42(1), I7–I16.
doi: 10.1097/01.mlr.0000103528.48582.7c
Bock, D. (1997). A brief history of item theory. Educ Meas, 16, 21–33.
doi: 10.1111/j.1745-3992.1997.tb00605.x
Boursicot, K., Roberts, T. E., & Pell, G. (2006). Standard setting for clinical competence at graduation from medical school: A comparison of passing scores across five medical schools. Advances in Health Sciences Education, 11(2), 173–183.
doi: 10.1007/s10459-005-5291-8
Brannick, M. T., Erol-Korkmaz, H. T., & Prewett, M. (2011). A systematic review of the reliability of objective structured clinical examination scores. Medical Education, 45, 1181–1189. https://doi.org/10.1111/j.1365-2923.2011.04075.x
doi: 10.1111/j.1365-2923.2011.04075.x
Chen, B., Azad, S., Fowler, M., West, M., & Zilles, C. (2020). Learning to Cheat: Quantifying Changes in Score Advantage of Unproctored Assessments Over Time. Proceedings of the Seventh ACM Conference on Learning @ Scale, 197–206. https://doi.org/10.1145/3386527.3405925
Cox, M., Irby, D. M., & Epstein, R. M. (2007). Assessment in medical education. New England Journal of Medicine, 356, 387–396.
doi: 10.1056/NEJMra054784
Crawford, A. V., Green, S. B., Levy, R., Lo, W. J., Scott, L., Svetina, D., & Thompson, M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors. Educational and Psychological Measurement, 70, 885–901. https://doi.org/10.1177/0013164410379332
doi: 10.1177/0013164410379332
Cross the Line Practical Assessment, Research & Evaluation, 26, 11. https://doi.org/10.7275/v2gd-4441
Cuddy, M. M., Young, A., Gelman, A., Swanson, D. B., Johnson, D. A., Dillon, G. F., & Clauser, B. E. (2017). Exploring the relationships between USMLE performance and disciplinary action in practice: A validity study of score inferences from a licensure examination. Academic Medicine, 92(12), 1780–1785.
doi: 10.1097/ACM.0000000000001747
De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44(1), 109–117.
doi: 10.1111/j.1365-2923.2009.03425.x
General Medical Council. (2018). Outcomes for graduates. General Medical Council.
Glas, C. A., & Verhelst, N. D. (1995). Testing the Rasch model. Rasch models (pp. 69–95). Springer.
Homer, M. (2021). Re-conceptualising and accounting for examiner (cut-score) stringency in a ‘high frequency, small cohort’ performance test. Advances in Health Sciences Education, 26(2), 369–383. https://doi.org/10.1007/s10459-020-09990-x
doi: 10.1007/s10459-020-09990-x
Homer, M., & Darling, J. C. (2016). Setting standards in knowledge assessments: Comparing Ebel and Cohen via Rasch. Medical Teacher, 38(12), 1267–1277. https://doi.org/10.1080/0142159X.2016.1230184
doi: 10.1080/0142159X.2016.1230184
Hope, D., Adamson, K., McManus, I. C., Chis, L., & Elder, A. T. (2018). Using Differential Item Functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment. BMC Medical Education, 18(1), 64.
doi: 10.1186/s12909-018-1143-0
Hope, D., Kluth, D., Homer, M., Dewar, A., Fuller, R., & Cameron, H. (2021). Variation in performance on common content items at UK medical schools. BMC Medical Education, 21(1), 1–9.
doi: 10.1186/s12909-021-02761-1
Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 179–193.
McManus, I. C., Elder, A., de Champlain, A., Dacre, J., Mollon, J., & Chis, L. (2008). Graduates of different UK medical schools show substantial differences in performance on MRCP(UK) Part 1, part 2 and PACES examinations. BMC Medicine, 6, 5.
doi: 10.1186/1741-7015-6-5
McManus, I. C., Chis, L., Fox, R., Waller, D., & Tang, P. (2014). Implementing statistical equating for MRCP (UK) parts 1 and 2. BMC Medical Education, 14(1), 204.
doi: 10.1186/1472-6920-14-204
Norcini, J. J. J. (1999). Standards and reliability in evaluation: When rules of thumb don’t apply. Academic Medicine, 74, 1088–1090.
doi: 10.1097/00001888-199910000-00010
Norcini, J. J., Boulet, J. R., Opalek, A., & Dauphinee, W. D. (2014). The relationship between licensing examination performance and the outcomes of care by international medical school graduates. Academic Medicine, 89(8), 1157–1162.
doi: 10.1097/ACM.0000000000000310
Pell, G., Fuller, R., Homer, M., & Roberts, T. (2010). How to measure the quality of the OSCE: A review of metrics—AMEE guide 49. Medical Teacher, 32, 802–811. https://doi.org/10.3109/0142159X.2010.507716
doi: 10.3109/0142159X.2010.507716
Pell, G., Fuller, R., Homer, M., & Roberts, T. (2013). Advancing the objective structured clinical examination: Sequential testing in theory and practice. Medical Education, 47, 569–577. https://doi.org/10.1111/medu.12136
doi: 10.1111/medu.12136
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Institute of Education Research.
Ricker, K. L. (2006). Setting cut-scores: A critical review of the Angoff and modified Angoff methods. Alberta Journal of Educational Research, 52, 53–64.
Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of Test equating methods with a special focus on IRT-Based approaches. Statistica, 77(4), 329–352. https://doi.org/10.6092/issn.1973-2201/7066 . http://dx.doi.org.ezproxy.is.ed.ac.uk/
doi: 10.6092/issn.1973-2201/7066
Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2006). A plea for new psychometric models in educational assessment. Medical Education, 40, 296–300. https://doi.org/10.1111/j.1365-2929.2006.02405.x
doi: 10.1111/j.1365-2929.2006.02405.x
Schuwirth, L., Bosman, G., Henning, R. H., Rinkel, R., & Wenink, A. C. G. (2010). Collaboration on progress testing in medical schools in the Netherlands. Medical Teacher, 32(6), 476–479. https://doi.org/10.3109/0142159X.2010.485658
doi: 10.3109/0142159X.2010.485658
Stemler, S. E., & Naples, A. (2021). Rasch Measurement v. Knowing When to.
Tavakol, M., & Dennick, R. (2012). Post-examination interpretation of objective test data: Monitoring and improving the quality of high–stakes examinations– a commentary on two AMEE guides. Medical Teacher, 34, 245–248. https://doi.org/10.3109/0142159X.2012.643266
doi: 10.3109/0142159X.2012.643266
Tavakol, M., & Dennick, R. (2013). Psychometric evaluation of a knowledge based examination using Rasch analysis: An illustrative guide: AMEE Guide 72. Medical Teacher, 35(1), e838–e848. https://doi.org/10.3109/0142159X.2012.737488
doi: 10.3109/0142159X.2012.737488
Taylor, C. A., Gurnell, M., Melville, C. R., Kluth, D. C., Johnson, N., & Wass, V. (2017). Variation in passing standards for graduation-level knowledge items at UK medical schools. Medical Education, 51(6), 612–620.
doi: 10.1111/medu.13240
Van der Linden, W. J. (Ed.). (2016). Handbook of item response theory: Volume 1: Models. CRC.
Wang, W. C., & Yeh, Y. L. (2003). Effects of Anchor Item methods on Differential Item Functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27(6), 479–498.
doi: 10.1177/0146621603259902
Wrigley, W., Van Der Vleuten, C. P., Freeman, A., & Muijtjens, A. (2012). A systemic framework for the progress test: Strengths, constraints and issues: AMEE Guide 71. Medical Teacher, 34(9), 683–697. https://doi.org/10.3109/0142159X.2012.704437
doi: 10.3109/0142159X.2012.704437
Yeates, P., Cope, N., Luksaite, E., Hassell, A., & Dikomitis, L. (2019). Exploring differences in individual and group judgements in standard setting. Medical Education, 53(9), 941–952.
doi: 10.1111/medu.13915