Comparing the prediction performance of item response theory and machine learning methods on item responses for educational assessments.
Background information
Educational assessment
Explanatory item response model
Item response theory
Machine learning
Prediction performance
Journal
Behavior research methods
ISSN: 1554-3528
Titre abrégé: Behav Res Methods
Pays: United States
ID NLM: 101244316
Informations de publication
Date de publication:
Jun 2023
Jun 2023
Historique:
accepted:
16
06
2022
medline:
12
6
2023
pubmed:
13
7
2022
entrez:
12
7
2022
Statut:
ppublish
Résumé
To obtain more accurate and robust feedback information from the students' assessment outcomes and to communicate it to students and optimize teaching and learning strategies, educational researchers and practitioners must critically reflect on whether the existing methods of data analytics are capable of retrieving the information provided in the database. This study compared and contrasted the prediction performance of an item response theory method, particularly the use of an explanatory item response model (EIRM), and six supervised machine learning (ML) methods for predicting students' item responses in educational assessments, considering student- and item-related background information. Each of seven prediction methods was evaluated through cross-validation approaches under three prediction scenarios: (a) unrealized responses of new students to existing items, (b) unrealized responses of existing students to new items, and (c) missing responses of existing students to existing items. The results of a simulation study and two real-life assessment data examples showed that employing student- and item-related background information in addition to the item response data substantially increases the prediction accuracy for new students or items. We also found that the EIRM is as competitive as the best performing ML methods in predicting the student performance outcomes for the educational assessment datasets.
Identifiants
pubmed: 35819719
doi: 10.3758/s13428-022-01910-8
pii: 10.3758/s13428-022-01910-8
pmc: PMC9275388
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
2109-2124Informations de copyright
© 2022. The Psychonomic Society, Inc.
Références
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. American Statistician, 46(3), 175–185. https://doi.org/10.1080/00031305.1992.10475879
doi: 10.1080/00031305.1992.10475879
Anderson, J. O., Lin, H., Treagust, D. F., Ross, S. P., & Yore, L. D. (2007). Using large-scale assessment datasets for research in science and mathematics education: Programme for international student assessment (PISA). International Journal of Science and Mathematics Education, 5(4), 591–614.
doi: 10.1007/s10763-007-9090-y
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R package version, 1, 1–17.
Bergner, Y., Droschler, S., Kortemeyer, G., Rayyan, S., Seaton, D., & Pritchard, D. E. (2012). Model-based collaborative filtering analysis of student response data: Machine-learning item response theory. International Educational Data Mining Society.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
doi: 10.1023/A:1010933404324
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification and regression trees (eBook). Boca Raton, Florida: Routledge. https://doi.org/10.1201/9781315139470
Calvo, B., & Santafé Rodrigo, G. (2016). Scmamp: Statistical comparison of multiple algorithms in multiple problems. The R Journal, 8(1), 248–255.
doi: 10.32614/RJ-2016-017
Cawley, G. C., & Talbot, N. L. C. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11, 2079–2107 http://www.jmlr.org/papers/volume11/cawley10a/cawley10a.pdf
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, 13–17-Augu, 785–794. https://doi.org/10.1145/2939672.2939785
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533–559.
doi: 10.1007/s11336-008-9092-x
De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer-Verlag.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Denis, J., Carpentier, N., Laenen, I., Willem, L., Janssen, R., & Aesaert, K. (2018). Peiling Frans in het basisonderwijs – Eindrapport. Unpublished technical report.
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181. https://doi.org/10.1117/1.JRS.11.015020
doi: 10.1117/1.JRS.11.015020
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
doi: 10.1214/aos/1013203451
Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 86–92.
Gonzalez, O. (2020). Psychometric and machine learning approaches for diagnostic assessment and tests of individual classification. Psychological Methods: Advance online publication. https://doi.org/10.1037/met0000317
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-84858-7
Horvitz, E., & Mulligan, D. (2015). Data, privacy, and the greater good. Science, 349(6245), 253–255. https://doi.org/10.1126/science.aac4520
doi: 10.1126/science.aac4520
pubmed: 26185242
Hsia, T. C., Shie, A. J., & Chen, L. C. (2008). Course planning of extension education to meet market demand by using data mining techniques - an example of Chinkuo technology university in Taiwan. Expert Systems with Applications, 34(1), 596–602. https://doi.org/10.1016/j.eswa.2006.09.025
doi: 10.1016/j.eswa.2006.09.025
Huang, X., Khetan, A., Cvitkovic, M., & Karnin, Z. (2020). Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678.
Jiao, H., & Lissitz, R. (2020). What hath the coronavirus brought to assessment? Unprecedented challenges in educational assessment in 2020 and years to come. Educational Measurement, Issues and Practice, 39(3), 45–48.
doi: 10.1111/emip.12363
pmcid: 7436625
Kim, J., & Wilson, M. (2020). Polytomous item explanatory item response theory models. Educational and Psychological Measurement, 80(4), 726–755.
doi: 10.1177/0013164419892667
pubmed: 32616956
Kingma, D., & Ba, J. (2017). Adam: A method for stochastic optimization. ArXiv., 1412, 6980.
Kotsiantis, S. B. (2012). Use of machine learning techniques for educational proposes: A decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331–344. https://doi.org/10.1007/s10462-011-9234-x
doi: 10.1007/s10462-011-9234-x
Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., & Loumos, V. (2009). Dropout prediction in e-learning courses through the combination of machine learning techniques. Computers and Education, 53(3), 950–965. https://doi.org/10.1016/j.compedu.2009.05.010
doi: 10.1016/j.compedu.2009.05.010
Nemenyi, P. (1963). Distribution-free multiple comparisonsPhD thesis. Princeton University.
Park, J. Y., Joo, S. H., Cornillie, F., et al. (2019). An explanatory item response theory method for alleviating the cold-start problem in adaptive learning environments. Behav Res, 51, 895–909. https://doi.org/10.3758/s13428-018-1166-9
doi: 10.3758/s13428-018-1166-9
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Blondel, M. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 2825–2830.
Pliakos, K., Joo, S., Park, J. Y., Cornillie, F., Vens, C., & Van den Noortgate, W. (2019). Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers and Education, 137, 91–103.
doi: 10.1016/j.compedu.2019.04.009
Pliakos, K., Geurts, P., & Vens, C. (2018). Global multi-output decision trees for interaction prediction. Machine Learning, 107(8), 1257–1281. https://doi.org/10.1007/s10994-018-5700-x
doi: 10.1007/s10994-018-5700-x
Pliakos, K., & Vens, C. (2019). Network inference with ensembles of bi-clustering trees. BMC Bioinformatics, 20(1), 1–12. https://doi.org/10.1186/s12859-019-3104-y
doi: 10.1186/s12859-019-3104-y
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/bf00116251
doi: 10.1007/bf00116251
Rovira, S., Puertas, E., & Igual, L. (2017). Data-driven system to predict academic grades and dropout. PLoS One, 12(2). https://doi.org/10.1371/journal.pone.0171207
Salakhutdinov, R., & Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine learning (pp. 880–887).
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.
doi: 10.1214/10-STS330
Tharwat, A. (2016). Linear vs. quadratic discriminant analysis classifier: A tutorial. International journal of applied. Pattern Recognition, 3(2), 145. https://doi.org/10.1504/ijapr.2016.079050
doi: 10.1504/ijapr.2016.079050
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369–386.
doi: 10.3102/10769986028004369
Van Der Malsburg, C. (1986). Frank Rosenblatt: Principles of Neurodynamics: Perceptrons and the theory of brain mechanisms. In G. Palm & A. Aertsen (Eds.), Brain theory (pp. 245–248). Springer-Verlag. https://doi.org/10.1007/978-3-642-70911-1_20
doi: 10.1007/978-3-642-70911-1_20
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers Inc.
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.
doi: 10.1109/4235.585893