Shadows of wisdom: Classifying meta-cognitive and morally grounded narrative content via large language models.

Computational social science Content analysis NLP Social science methods Wisdom

Journal

Behavior research methods

ISSN: 1554-3528

Titre abrégé: Behav Res Methods

Pays: United States

ID NLM: 101244316

Informations de publication

Date de publication:
29 May 2024

Historique:

accepted: 14 05 2024

medline: 30 5 2024

pubmed: 30 5 2024

entrez: 29 5 2024

Statut: aheadofprint

Résumé

We investigated large language models' (LLMs) efficacy in classifying complex psychological constructs like intellectual humility, perspective-taking, open-mindedness, and search for a compromise in narratives of 347 Canadian and American adults reflecting on a workplace conflict. Using state-of-the-art models like GPT-4 across few-shot and zero-shot paradigms and RoB-ELoC (RoBERTa -fine-tuned-on-Emotion-with-Logistic-Regression-Classifier), we compared their performance with expert human coders. Results showed robust classification by LLMs, with over 80% agreement and F1 scores above 0.85, and high human-model reliability (Cohen's κ Md across top models = .80). RoB-ELoC and few-shot GPT-4 were standout classifiers, although somewhat less effective in categorizing intellectual humility. We offer example workflows for easy integration into research. Our proof-of-concept findings indicate the viability of both open-source and commercial LLMs in automating the coding of complex constructs, potentially transforming social science research.

Identifiants

DOI: 10.3758/s13428-024-02441-0 PMID: 38811519

pubmed: 38811519

doi: 10.3758/s13428-024-02441-0

pii: 10.3758/s13428-024-02441-0

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : Social Sciences and Humanities Research Council of Canada

ID : 435-2014-0685

Organisme : John Templeton Foundation

ID : 62260

Informations de copyright

Références

Adoma, A. F., Henry, N. M., & Chen, W. (2020). Comparative analyses of BERT, RoBERTa, DistilBERT, and XLNet for text-based emotion recognition. 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) (pp. 117–121). IEEE.

doi: 10.1109/ICCWAMTIP51612.2020.9317379

Anderson, J. (1983). Lix and Rix: Variations on a little-known readability index. Journal of Reading, 26(6), 490–496.

Barrett, L. F. (2022). Context reconsidered: Complex signal ensembles, relational meaning, and population thinking in psychological science. American Psychologist, 77(8), 894–920. https://doi.org/10.1037/amp0001054

doi: 10.1037/amp0001054 pubmed: 36409120

Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaïlov, S. (2016). Crowd-sourced text analysis: Reproducible and agile production of political data. The American Political Science Review, 110(2), 278–295. https://doi.org/10.1017/S0003055416000058

doi: 10.1017/S0003055416000058

Brienza, J. P., Kung, F. Y., & Chao, M. M. (2021). Wise reasoning, intergroup positivity, and attitude polarization across contexts. Nature Communications, 12(1), 3313.

doi: 10.1038/s41467-021-23432-1 pubmed: 34083528 pmcid: 8175723

Brienza, J. P., Kung, F. Y. H., Santos, H. C., Bobocel, D. R., & Grossmann, I. (2018). Wisdom, bias, and balance: Toward a process-sensitive measurement of wisdom-related cognition. Journal of Personality and Social Psychology, 115(6), 1093–1126. https://doi.org/10.1037/pspp0000171

doi: 10.1037/pspp0000171 pubmed: 28933874

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ..., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

Buechel, S., Buffone, A., Slaff, B., Ungar, L., & Sedoc, J. (2018). Modeling empathy and distress in reaction to news stories. arXiv. https://doi.org/10.48550/arXiv.1808.10399

Chan, J.Y.-L., Bea, K. T., Leow, S. M., Phoong, S. W., & Cheng, W. K. (2022). State of the art: A review of sentiment analysis based on Sequential Transfer Learning. Artificial Intelligence Review, 56(1), 749–780. https://doi.org/10.1007/s10462-022-10183-8

doi: 10.1007/s10462-022-10183-8

Cofie, N., Braund, H., & Dalgarno, N. (2022). Eight ways to get a grip on intercoder reliability using qualitative-based measures. Canadian medical education journal, 13(2), 73–76. https://doi.org/10.36834/cmej.72504

doi: 10.36834/cmej.72504 pubmed: 35572014 pmcid: 9099179

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104

doi: 10.1177/001316446002000104

Costello, T. H., Newton, C., Lin, H., & Pennycook, G. (2023). A metacognitive blindspot in intellectual humility measures. PsyArXiv. https://doi.org/10.31234/osf.io/gux95

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.

doi: 10.1037/h0040957 pubmed: 13245896

Demszky, D., Yang, D., Yeager, D. S., Bryan, C. J., Clapper, M., Chandhok, S., Eichstaedt, J. C., Hecht, C., Jamieson, J., Johnson, M., Jones, M., Krettek-Cobb, D., Lai, L., Jones Mitchell, N., Ong, D. C., Dweck, C. S., Gross, J. J., & Pennebaker, J. W. (2023). Using large language models in psychology. Nature Reviews Psychology, 2(11), 688–701. https://doi.org/10.1038/s44159-023-00241-5

doi: 10.1038/s44159-023-00241-5

Dodge, J., Ilharco, G., Schwartz, R., Farhadi, A., Hajishirzi, H., & Smith, N. (2020). Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv preprint arXiv:2002.06305.

Dunning, D., Heath, C., & Suls, J. M. (2004). Flawed self-assessment. Psychological Science in the Public Interest, 5(3), 69–106. https://doi.org/10.1111/j.1529-1006.2004.00018.x

doi: 10.1111/j.1529-1006.2004.00018.x pubmed: 26158995

Fiske, A., Henningsen, P., & Buyx, A. (2019). Your robot therapist will see you now: Ethical implications of embodied artificial intelligence in psychiatry, psychology, and psychotherapy. Journal of Medical Internet Research, 21(5), e13216. https://doi.org/10.2196/13216

doi: 10.2196/13216 pubmed: 31094356 pmcid: 6532335

Flyvbjerg, B. (2001). Making social science matter: Why social inquiry fails and how it can succeed again. Cambridge University Press.

doi: 10.1017/CBO9780511810503

Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon Mechanical Turk: Gold mine or coal mine? Computational Linguistics, 37(2), 413–420.

doi: 10.1162/COLI_a_00057

Garten, J., Boghrati, R., Hoover, J., Johnson, K. M., & Dehghani, M. (2016). Morality between the lines: Detecting moral sentiment in text. Proceedings of IJCAI 2016 workshop on Computational Modeling of Attitudes.

Glück, J. (2018). Measuring wisdom: Existing approaches, continuing challenges, and new developments. The Journals of Gerontology: Series B, 73(8), 1393–1403.

doi: 10.1093/geronb/gbx140

Grossmann, I., Brienza, J. P., & Bobocel, D. R. (2017). Wise deliberation sustains cooperation. Nature Human Behaviour, 1(3), 0061.

doi: 10.1038/s41562-017-0061

Grossmann, I., Feinberg, M., Parker, D. C., Christakis, N. A., Tetlock, P. E., & Cunningham, W. A. (2023). AI and the transformation of social science research. Science, 380(6650), 1108–1109. https://doi.org/10.1126/science.adi1778

doi: 10.1126/science.adi1778 pubmed: 37319216

Grossmann, I., Na, J., Varnum, M. E., Kitayama, S., & Nisbett, R. E. (2013). A route to well-being: Intelligence versus wise reasoning. Journal of Experimental Psychology: General, 142(3), 944.

doi: 10.1037/a0029560 pubmed: 22866683

Grossmann, I., Weststrate, N. M., Ardelt, M., Brienza, J. P., Dong, M., Ferrari, M., Fournier, M. A., Hu, C. S., Nusbaum, H. C., & Vervaeke, J. (2020). The science of wisdom in a polarized world: Knowns and unknowns. Psychological Inquiry, 31(2), 103–133. https://doi.org/10.1080/1047840x.2020.1750917

doi: 10.1080/1047840x.2020.1750917

Hartmann, J., Heitmann, M., Siebert, C., & Schamp, C. (2023). More than a feeling: Accuracy and application of sentiment analysis. International Journal of Research in Marketing, 40(1), 75–87. https://doi.org/10.1016/j.ijresmar.2022.05.005

doi: 10.1016/j.ijresmar.2022.05.005

Hattie, J., & Cooksey, R. W. (1984). Procedures for assessing the validities of tests using the “known-groups” method. Applied Psychological Measurement, 8(3), 295–305. https://doi.org/10.1177/014662168400800306

doi: 10.1177/014662168400800306

Hosmer, D. W., Jr., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (398th ed.). John Wiley & Sons.

doi: 10.1002/9781118548387

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., ..., & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.

Karlan, B., & Allen, C. (2022). Engineered wisdom for learning machines. Journal of Experimental & Theoretical Artificial Intelligence, 36(2), 257–272. https://doi.org/10.1080/0952813x.2022.2092559

doi: 10.1080/0952813x.2022.2092559

Kern, AI. (2023). Refinery. refinery - Kern AI - Documentation. Retrieved June 8, 2023 from https://docs.kern.ai/refinery

Khanjani, A., & Sulaiman, R. (2011). The aspects of choosing open source versus closed source. 2011 IEEE Symposium on Computers & Informatics (pp. 646–649). IEEE.

doi: 10.1109/ISCI.2011.5958992

Khurana, D., Koli, A., Khatter, K., & Singh, S. (2022). Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713–3744. https://doi.org/10.1007/s11042-022-13428-4

doi: 10.1007/s11042-022-13428-4 pubmed: 35855771 pmcid: 9281254

Koetke, J., Schumann, K., & Porter, T. (2022). Intellectual humility predicts scrutiny of COVID-19 misinformation. Social Psychological and Personality Science, 13(1), 277–284.

doi: 10.1177/1948550620988242

Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Sage Publications.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310

doi: 10.2307/2529310 pubmed: 843571

Lialin, V., Deshpande, V., & Rumshisky, A. (2023). Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv. https://doi.org/10.48550/arXiv.2303.15647

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., . . . Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv . https://doi.org/10.48550/arXiv.1907.11692

Lyons, B. A., Montgomery, J. M., Guess, A. M., Nyhan, B., & Reifler, J. (2021). Overconfidence in news judgments is associated with false news susceptibility. Proceedings of the National Academy of Sciences, 118(23). https://doi.org/10.1073/pnas.2019527118

MacQueen, K. M., McLellan, E., Kay, K., & Milstein, B. (1998). Codebook development for team-based qualitative analysis. Cam Journal, 10(2), 31–36.

doi: 10.1177/1525822X980100020301

Messeri, L., & Crockett, M. J. (2024). Artificial intelligence and illusions of understanding in scientific research. Nature, 627(8002), 49–58. https://doi.org/10.1038/s41586-024-07146-0

doi: 10.1038/s41586-024-07146-0 pubmed: 38448693

OpenAI. (2023). Models. OpenAI Platform. Retrieved August 26, 2023 from https://platform.openai.com/docs/models

Pargent, F., Schoedel, R., & Stachl, C. (2023). Best practices in supervised machine learning: A tutorial for psychologists. Advances in Methods and Practices in Psychological Science, 6(3). https://doi.org/10.1177/25152459231162559

Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., & Rand, D. G. (2020). Fighting COVID-19 misinformation on social media: Experimental evidence for a scalable accuracy-nudge intervention. Psychological science, 31(7), 770–780.

doi: 10.1177/0956797620939054 pubmed: 32603243

Porter, T., Elnakouri, A., Meyers, E. A., Shibayama, T., Jayawickreme, E., & Grossmann, I. (2022). Predictors and consequences of intellectual humility. Nature Reviews Psychology, 1(9), 524–536. https://doi.org/10.1038/s44159-022-00081-9

doi: 10.1038/s44159-022-00081-9 pubmed: 35789951 pmcid: 9244574

Price, P. C., Jhangiani, R. A., & Chiang, I.-C. S. (2015). Research methods in psychology - 2nd Canadian edition. BCcampus. Retrieved February 3, 2024 from https://opentextbc.ca/researchmethods/

Rathje, S., Mirea, D., Sucholutsky, I., Marjieh, R., Robertson, C., & Van Bavel, J. J. (2023). GPT is an effective tool for multilingual psychological text analysis. https://doi.org/10.31234/osf.io/sekf5

Robinson, M. D., & Clore, G. L. (2002). Belief and feeling: Evidence for an accessibility model of emotional self-report. Psychological Bulletin, 128(6), 934–960. https://doi.org/10.1037/0033-2909.128.6.934

doi: 10.1037/0033-2909.128.6.934 pubmed: 12405138

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE, 8(9). https://doi.org/10.1371/journal.pone.0073791

Shushkevich, E., Alexandrov, M., & Cardiff, J. (2023). Improving multiclass classification of fake news using Bert-based models and CHATGPT-augmented data. Inventions, 8(5), 112. https://doi.org/10.3390/inventions8050112

doi: 10.3390/inventions8050112

Sun, X., Gu, J., & Sun, H. (2021). Research progress of zero-shot learning. Applied Intelligence, 51, 3600–3614.

doi: 10.1007/s10489-020-02075-7

Torre, J. B., & Lieberman, M. D. (2018). Putting feelings into words: Affect labeling as implicit emotion regulation. Emotion Review, 10(2), 116–124. https://doi.org/10.1177/1754073917742706

doi: 10.1177/1754073917742706

Vazire, S., & Carlson, E. N. (2011). Others sometimes know us better than we know ourselves. Current Directions in Psychological Science, 20(2), 104–108.

doi: 10.1177/0963721411402478

Webb, T., Holyoak, K.J. & Lu, H. (2023). Emergent analogical reasoning in large language models. Nature and Human Behavior. https://doi.org/10.1038/s41562-023-01659-w

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ..., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837.

Yu, Y., Zuo, S., Jiang, H., Ren, W., Zhao, T., & Zhang, C. (2020). Fine-tuning pre-trained language model with weak supervision: A contrastive-regularized self-training approach. arXiv preprint arXiv:2010.07835.

Yu, H., Yang, Z., Pelrine, K., Godbout, J. F., & Rabbany, R. (2023). Open, closed, or small language models for text classification? arXiv preprint arXiv:2308.10092.

Zhao, Z., Zhang, Z., & Hopfgartner, F. (2021). A comparative study of using pre-trained language models for toxic comment classification. WWW ’21: Companion Proceedings of the Web Conference 2021, 500–507. https://doi.org/10.1145/3442442.3452313

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large language models are human-level prompt engineers. arXiv. https://doi.org/10.48550/arXiv.2211.01910

Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., & Yang, D. (2023). can large language models transform computational social science? arXiv. https://doi.org/10.48550/arXiv.2305.03514

Shadows of wisdom: Classifying meta-cognitive and morally grounded narrative content via large language models.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Subventions

Informations de copyright

Références

Auteurs

Alexander Stavropoulos (A)

Damien L Crone (DL)

Igor Grossmann (I)

Classifications MeSH