Comparing the prediction performance of item response theory and machine learning methods on item responses for educational assessments.

Humans Students Computer Simulation Educational Status Educational Measurement Machine Learning

Background information Educational assessment Explanatory item response model Item response theory Machine learning Prediction performance

Journal

Behavior research methods

ISSN: 1554-3528

Titre abrégé: Behav Res Methods

Pays: United States

ID NLM: 101244316

Informations de publication

Date de publication:
Jun 2023

Historique:

accepted: 16 06 2022

medline: 12 6 2023

pubmed: 13 7 2022

entrez: 12 7 2022

Statut: ppublish

Résumé

To obtain more accurate and robust feedback information from the students' assessment outcomes and to communicate it to students and optimize teaching and learning strategies, educational researchers and practitioners must critically reflect on whether the existing methods of data analytics are capable of retrieving the information provided in the database. This study compared and contrasted the prediction performance of an item response theory method, particularly the use of an explanatory item response model (EIRM), and six supervised machine learning (ML) methods for predicting students' item responses in educational assessments, considering student- and item-related background information. Each of seven prediction methods was evaluated through cross-validation approaches under three prediction scenarios: (a) unrealized responses of new students to existing items, (b) unrealized responses of existing students to new items, and (c) missing responses of existing students to existing items. The results of a simulation study and two real-life assessment data examples showed that employing student- and item-related background information in addition to the item response data substantially increases the prediction accuracy for new students or items. We also found that the EIRM is as competitive as the best performing ML methods in predicting the student performance outcomes for the educational assessment datasets.

Identifiants

DOI: 10.3758/s13428-022-01910-8 PMID: 35819719 PMC: PMC9275388

pubmed: 35819719

doi: 10.3758/s13428-022-01910-8

pii: 10.3758/s13428-022-01910-8

pmc: PMC9275388

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

2109-2124

Informations de copyright

Références

Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. American Statistician, 46(3), 175–185. https://doi.org/10.1080/00031305.1992.10475879

doi: 10.1080/00031305.1992.10475879

Anderson, J. O., Lin, H., Treagust, D. F., Ross, S. P., & Yore, L. D. (2007). Using large-scale assessment datasets for research in science and mathematics education: Programme for international student assessment (PISA). International Journal of Science and Mathematics Education, 5(4), 591–614.

doi: 10.1007/s10763-007-9090-y

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R package version, 1, 1–17.

Bergner, Y., Droschler, S., Kortemeyer, G., Rayyan, S., Seaton, D., & Pritchard, D. E. (2012). Model-based collaborative filtering analysis of student response data: Machine-learning item response theory. International Educational Data Mining Society.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

doi: 10.1023/A:1010933404324

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification and regression trees (eBook). Boca Raton, Florida: Routledge. https://doi.org/10.1201/9781315139470

Calvo, B., & Santafé Rodrigo, G. (2016). Scmamp: Statistical comparison of multiple algorithms in multiple problems. The R Journal, 8(1), 248–255.

doi: 10.32614/RJ-2016-017

Cawley, G. C., & Talbot, N. L. C. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11, 2079–2107 http://www.jmlr.org/papers/volume11/cawley10a/cawley10a.pdf

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, 13–17-Augu, 785–794. https://doi.org/10.1145/2939672.2939785

De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533–559.

doi: 10.1007/s11336-008-9092-x

De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer-Verlag.

Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.

Denis, J., Carpentier, N., Laenen, I., Willem, L., Janssen, R., & Aesaert, K. (2018). Peiling Frans in het basisonderwijs – Eindrapport. Unpublished technical report.

Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181. https://doi.org/10.1117/1.JRS.11.015020

doi: 10.1117/1.JRS.11.015020

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

doi: 10.1214/aos/1013203451

Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 86–92.

Gonzalez, O. (2020). Psychometric and machine learning approaches for diagnostic assessment and tests of individual classification. Psychological Methods: Advance online publication. https://doi.org/10.1037/met0000317

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-84858-7

Horvitz, E., & Mulligan, D. (2015). Data, privacy, and the greater good. Science, 349(6245), 253–255. https://doi.org/10.1126/science.aac4520

doi: 10.1126/science.aac4520 pubmed: 26185242

Hsia, T. C., Shie, A. J., & Chen, L. C. (2008). Course planning of extension education to meet market demand by using data mining techniques - an example of Chinkuo technology university in Taiwan. Expert Systems with Applications, 34(1), 596–602. https://doi.org/10.1016/j.eswa.2006.09.025

doi: 10.1016/j.eswa.2006.09.025

Huang, X., Khetan, A., Cvitkovic, M., & Karnin, Z. (2020). Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678.

Jiao, H., & Lissitz, R. (2020). What hath the coronavirus brought to assessment? Unprecedented challenges in educational assessment in 2020 and years to come. Educational Measurement, Issues and Practice, 39(3), 45–48.

doi: 10.1111/emip.12363 pmcid: 7436625

Kim, J., & Wilson, M. (2020). Polytomous item explanatory item response theory models. Educational and Psychological Measurement, 80(4), 726–755.

doi: 10.1177/0013164419892667 pubmed: 32616956

Kingma, D., & Ba, J. (2017). Adam: A method for stochastic optimization. ArXiv., 1412, 6980.

Kotsiantis, S. B. (2012). Use of machine learning techniques for educational proposes: A decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331–344. https://doi.org/10.1007/s10462-011-9234-x

doi: 10.1007/s10462-011-9234-x

Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., & Loumos, V. (2009). Dropout prediction in e-learning courses through the combination of machine learning techniques. Computers and Education, 53(3), 950–965. https://doi.org/10.1016/j.compedu.2009.05.010

doi: 10.1016/j.compedu.2009.05.010

Nemenyi, P. (1963). Distribution-free multiple comparisonsPhD thesis. Princeton University.

Park, J. Y., Joo, S. H., Cornillie, F., et al. (2019). An explanatory item response theory method for alleviating the cold-start problem in adaptive learning environments. Behav Res, 51, 895–909. https://doi.org/10.3758/s13428-018-1166-9

doi: 10.3758/s13428-018-1166-9

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Blondel, M. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 2825–2830.

Pliakos, K., Joo, S., Park, J. Y., Cornillie, F., Vens, C., & Van den Noortgate, W. (2019). Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers and Education, 137, 91–103.

doi: 10.1016/j.compedu.2019.04.009

Pliakos, K., Geurts, P., & Vens, C. (2018). Global multi-output decision trees for interaction prediction. Machine Learning, 107(8), 1257–1281. https://doi.org/10.1007/s10994-018-5700-x

doi: 10.1007/s10994-018-5700-x

Pliakos, K., & Vens, C. (2019). Network inference with ensembles of bi-clustering trees. BMC Bioinformatics, 20(1), 1–12. https://doi.org/10.1186/s12859-019-3104-y

doi: 10.1186/s12859-019-3104-y

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/bf00116251

doi: 10.1007/bf00116251

Rovira, S., Puertas, E., & Igual, L. (2017). Data-driven system to predict academic grades and dropout. PLoS One, 12(2). https://doi.org/10.1371/journal.pone.0171207

Salakhutdinov, R., & Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine learning (pp. 880–887).

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.

doi: 10.1214/10-STS330

Tharwat, A. (2016). Linear vs. quadratic discriminant analysis classifier: A tutorial. International journal of applied. Pattern Recognition, 3(2), 145. https://doi.org/10.1504/ijapr.2016.079050

doi: 10.1504/ijapr.2016.079050

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03

Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369–386.

doi: 10.3102/10769986028004369

Van Der Malsburg, C. (1986). Frank Rosenblatt: Principles of Neurodynamics: Perceptrons and the theory of brain mechanisms. In G. Palm & A. Aertsen (Eds.), Brain theory (pp. 245–248). Springer-Verlag. https://doi.org/10.1007/978-3-642-70911-1_20

doi: 10.1007/978-3-642-70911-1_20

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers Inc.

Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.

doi: 10.1109/4235.585893

Comparing the prediction performance of item response theory and machine learning methods on item responses for educational assessments.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Jung Yeon Park (JY)

Klest Dedja (K)

Konstantinos Pliakos (K)

Jinho Kim (J)

Sean Joo (S)

Frederik Cornillie (F)

Celine Vens (C)

Wim Van den Noortgate (W)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH