Detection of ChatGPT fake science with the xFakeSci learning algorithm.

Algorithms Humans Artificial Intelligence Machine Learning Publications

ChatGPT Fake publication Fake science Generative AI Human-generated publications ML Algorithm

Journal

Scientific reports

ISSN: 2045-2322

Titre abrégé: Sci Rep

Pays: England

ID NLM: 101563288

Informations de publication

Date de publication:
14 Jul 2024

Historique:

received: 11 07 2023

accepted: 03 07 2024

medline: 15 7 2024

pubmed: 15 7 2024

entrez: 14 7 2024

Statut: epublish

Résumé

Generative AI tools exemplified by ChatGPT are becoming a new reality. This study is motivated by the premise that "AI generated content may exhibit a distinctive behavior that can be separated from scientific articles". In this study, we show how articles can be generated using means of prompt engineering for various diseases and conditions. We then show how we tested this premise in two phases and prove its validity. Subsequently, we introduce xFakeSci, a novel learning algorithm, that is capable of distinguishing ChatGPT-generated articles from publications produced by scientists. The algorithm is trained using network models driven from both sources. To mitigate overfitting issues, we incorporated a calibration step that is built upon data-driven heuristics, including proximity and ratios. Specifically, from a total of a 3952 fake articles for three different medical conditions, the algorithm was trained using only 100 articles, but calibrated using folds of 100 articles. As for the classification step, it was performed using 300 articles per condition. The actual label steps took place against an equal mix of 50 generated articles and 50 authentic PubMed abstracts. The testing also spanned publication periods from 2010 to 2024 and encompassed research on three distinct diseases: cancer, depression, and Alzheimer's. Further, we evaluated the accuracy of the xFakeSci algorithm against some of the classical data mining algorithms (e.g., Support Vector Machines, Regression, and Naive Bayes). The xFakeSci algorithm achieved F1 scores ranging from 80 to 94%, outperforming common data mining algorithms, which scored F1 values between 38 and 52%. We attribute the noticeable difference to the introduction of calibration and a proximity distance heuristic, which underscores this promising performance. Indeed, the prediction of fake science generated by ChatGPT presents a considerable challenge. Nonetheless, the introduction of the xFakeSci algorithm is a significant step on the way to combating fake science.

Identifiants

DOI: 10.1038/s41598-024-66784-6 PMID: 39004625

pubmed: 39004625

doi: 10.1038/s41598-024-66784-6

pii: 10.1038/s41598-024-66784-6

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

16231

Subventions

Organisme : European Union's Horizon 2020 research and innovation programme

ID : 857533

Organisme : Ministerstwo Edukacji i Nauki

ID : MEiN/2023/DIR/3796

Organisme : National Natural Science Foundation of China

ID : 62120106008

Informations de copyright

Références

Chatgpt. Online: https://chat.openai.com (2023). Accessed 15 Aug 2023.

Synnestvedt, M. B., Chen, C. & Holmes, J. H. Citespace ii: visualization and knowledge discovery in bibliographic databases. In AMIA annual symposium proceedings, vol. 2005, 724 (American Medical Informatics Association, 2005).

Holzinger, A. et al. On graph entropy measures for knowledge discovery from publication network data. In Availability, Reliability, and Security in Information Systems and HCI: IFIP WG 8.4, 8.9, TC 5 International Cross-Domain Conference, CD-ARES 2013, Regensburg, Germany, September 2-6, 2013. Proceedings 8, 354–362 (Springer, 2013).

Usai, A., Pironti, M., Mital, M. & Aouina Mejri, C. Knowledge discovery out of text data: a systematic review via text mining. J. Knowl. Manag. 22, 1471–1488 (2018).

doi: 10.1108/JKM-11-2017-0517

Thaler, A. D. & Shiffman, D. Fish tales: Combating fake science in popular media. Ocean Coastal Manag. 115, 88–91 (2015).

doi: 10.1016/j.ocecoaman.2015.04.005

Hopf, H., Krief, A., Mehta, G. & Matlin, S. A. Fake science and the knowledge crisis: ignorance can be fatal. Royal Soc. Open Sci. 6, 190161 (2019).

doi: 10.1098/rsos.190161

Ho, S. S., Goh, T. J. & Leung, Y. W. Let’s nab fake science news: Predicting scientists’ support for interventions using the influence of presumed media influence model. Journalism 23, 910–928 (2022).

doi: 10.1177/1464884920937488

Frederickson, R. M. & Herzog, R. W. Addressing the big business of fake science. Molecular Therapy 30, 2390 (2022).

doi: 10.1016/j.ymthe.2022.06.001 pubmed: 35709762 pmcid: 9263403

Rocha, Y. M. et al. The impact of fake news on social media and its influence on health during the covid-19 pandemic: A systematic review. J. Public Health 31, 1–10 (2021).

Walter, N., Brooks, J. J., Saucier, C. J. & Suresh, S. Evaluating the impact of attempts to correct health misinformation on social media: A meta-analysis. Health Commun. 36, 1776–1784 (2021).

doi: 10.1080/10410236.2020.1794553 pubmed: 32762260

Loomba, S., de Figueiredo, A., Piatek, S. J., de Graaf, K. & Larson, H. J. Measuring the impact of covid-19 vaccine misinformation on vaccination intent in the uk and usa. Nat. Human Behav. 5, 337–348 (2021).

doi: 10.1038/s41562-021-01056-1

Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N. & Cook, J. Misinformation and its correction: Continued influence and successful debiasing. Psychol. Sci. Public Interest 13, 106–131 (2012).

doi: 10.1177/1529100612451018 pubmed: 26173286

Myers, M. & Pineda, D. Misinformation about vaccines. Vaccines for biodefense and emerging and neglected diseases 255–270 (2009).

Matthews, S. & Spencer, B. Government orders review into vitamin d’s role in covid-19. Online: https://www.dailymail.co.uk/news/article-8432321/Government-orders-review-vitamin-D-role-Covid-19.html (2020). Accessed on 13 Apr 2024.

Abdeen, M. A., Hamed, A. A. & Wu, X. Fighting the covid-19 infodemic in news articles and false publications: The neonet text classifier, a supervised machine learning algorithm. Appl. Sci. 11, 7265 (2021).

doi: 10.3390/app11167265

Hamed, A. A., Zachara-Szymanska, M. & Wu, X. Safeguarding authenticity for mitigating the harms of generative ai: Issues, research agenda, and policies for detection, fact-checking, and ethical ai. iScience 27, 108782. https://doi.org/10.1016/j.isci.2024.108782 (2024).

doi: 10.1016/j.isci.2024.108782 pubmed: 38318372 pmcid: 10838945

Eysenbach, G. et al. The role of chatgpt, generative language models, and artificial intelligence in medical education: A conversation with chatgpt and a call for papers. JMIR Med. Edu. 9, e46885 (2023).

doi: 10.2196/46885

IEEE special issue on education in the world of ChatGPT and other generative AI. Online: https://ieee-edusociety.org/ieee-special-issue-education-world-chatgpt-and-other-generative-ai (2023). Accessed 13 Apr 2024.

Financial innovation. Online: https://jfin-swufe.springeropen.com/special-issue---chatgpt-and-generative-ai-in-finance (2023). Accessed 13 Apr 2024.

Special issue “language generation with pretrained models”. Online: https://www.mdpi.com/journal/languages/special_issues/K1Z08ODH6V (Year). Accessed 13 Apr 2023.

Call for papers for the special focus issue on ChatGPT and large language models (LLMs) in biomedicine and health. https://academic.oup.com/jamia/pages/call-for-papers-for-special-focus-issue (Year). Accessed 4 July 2023.

Leung, T. I., de Azevedo Cardoso, T., Mavragani, A. & Eysenbach, G. Best practices for using ai tools as an author, peer reviewer, or editor. J. Med. Internet Res. 25, e51584. https://doi.org/10.2196/51584 (2023).

doi: 10.2196/51584 pubmed: 37651164 pmcid: 10502596

Null, N. The PNAS journals outline their policies for ChatGPT and generative AI. PNAS Updates https://doi.org/10.1073/pnas-updates.2023-02-21 (2023).

doi: 10.1073/pnas-updates.2023-02-21

Brainard, J. As scientists explore ai-written text, journals hammer out policies. Science 379, 740–741 (2023).

doi: 10.1126/science.adh2762 pubmed: 36821673

Fuster, V. et al. Jacc journals’ pathway forward with ai tools: The future is now. JACC: Adv. 2, 100296. https://doi.org/10.1016/j.jacadv.2023.100296 (2023).

doi: 10.1016/j.jacadv.2023.100296 pubmed: 38938321

Flanagin, A., Bibbins-Domingo, K., Berkwits, M. & Christiansen, S. L. Nonhuman “authors’’ and implications for the integrity of scientific publication and medical knowledge. Jama 329, 637–639 (2023).

doi: 10.1001/jama.2023.1344 pubmed: 36719674

Chatgpt plugins. Online: https://openai.com/blog/chatgpt-plugins (2023). Accessed 13 Apr 2023.

Gilson, A. et al. How does chatgpt perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med. Edu. 9, e45312 (2023).

doi: 10.2196/45312

Chaka, C. Detecting ai content in responses generated by chatgpt, youchat, and chatsonic: The case of five ai content detection tools. J. Appl. Learn. Teac. https://doi.org/10.37074/jalt.2023.6.2.12 (2023).

doi: 10.37074/jalt.2023.6.2.12

Vapnik, V. N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 10, 988–999 (1999).

doi: 10.1109/72.788640 pubmed: 18252602

Cingillioglu, I. Detecting ai-generated essays: the chatgpt challenge. Int. J. Inf. Learn. Technol. 40, 259–268 (2023).

doi: 10.1108/IJILT-03-2023-0043

Copyleaks: AI & machine learning powered plagiarism checker. Online: https://copyleaks.com/ . Accessed 13 Apr 2024.

Crossplag: Online plagiarism checker. Online: https://crossplag.com/ . Accessed 13 Apr 2024.

Elkhatat, A. M., Elsaid, K. & Almeer, S. Evaluating the efficacy of ai content detection tools in differentiating between human and ai-generated text. Int. J. Edu. Integrity 19, 17 (2023).

doi: 10.1007/s40979-023-00140-5

Anderson, N. et al. Ai did not write this manuscript, or did it? can we trick the ai text detector into generated texts? the potential future of chatgpt and ai in sports & exercise medicine manuscript generation. BMJ Open Sport Exercise Med. https://doi.org/10.1136/bmjsem-2023-001568 (2023).

doi: 10.1136/bmjsem-2023-001568

Rashidi, H. H., Fennell, B. D., Albahra, S., Hu, B. & Gorbett, T. The chatgpt conundrum: Human-generated scientific manuscripts misidentified as ai creations by ai text detection tool. J. Pathol. Inf. 14, 100342 (2023).

doi: 10.1016/j.jpi.2023.100342

NLM, N. L. o. M. National center of biotechnology information. Online: https://pubmed.ncbi.nlm.nih.gov/ . Accessed on 25 Jan 2024.

Wu, X. et al. Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008).

doi: 10.1007/s10115-007-0114-2

Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12, 2825–2830 (2011).

Aizawa, A. An information-theoretic perspective of tf-idf measures. Inf. Process. Manag. 39, 45–65 (2003).

doi: 10.1016/S0306-4573(02)00021-3

Qaiser, S. & Ali, R. Text mining: use of tf-idf to examine the relevance of words to documents. Int. J. Comput. Appl. 181, 25–29 (2018).

Ramos, J. et al. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, vol. 242,1, 29–48 (Citeseer, 2003).

Trstenjak, B., Mikac, S. & Donko, D. Knn with tf-idf based framework for text categorization. Proc. Eng. 69, 1356–1364 (2014).

doi: 10.1016/j.proeng.2014.03.129

Wu, H. C., Luk, R. W. P., Wong, K. F. & Kwok, K. L. Interpreting tf-idf term weights as making relevance decisions. ACM Trans. Inf. Sys. (TOIS) 26, 1–37 (2008).

doi: 10.1145/1361684.1361686

Zhang, W., Yoshida, T. & Tang, X. A comparative study of tf* idf, lsi and multi-words for text classification. Expert Syst. Appl. 38, 2758–2765 (2011).

doi: 10.1016/j.eswa.2010.08.066

Tan, C.-M., Wang, Y.-F. & Lee, C.-D. The use of bigrams to enhance text categorization. Inf. Process. Manag. 38, 529–546 (2002).

doi: 10.1016/S0306-4573(01)00045-0

Hirst, G. & Feiguina, O. Bigrams of syntactic labels for authorship discrimination of short texts. Literary Linguistic Comp. 22, 405–417 (2007).

doi: 10.1093/llc/fqm023

Dorogovtsev, S. N., Mendes, J. F. F. & Samukhin, A. N. Giant strongly connected component of directed networks. Phys. Rev. E 64, 025101 (2001).

doi: 10.1103/PhysRevE.64.025101

Kitsak, M. et al. Stability of a giant connected component in a complex network. Phys. Rev. E 97, 012309 (2018).

doi: 10.1103/PhysRevE.97.012309 pubmed: 29448477

Beygelzimer, A., Grinstein, G., Linsker, R. & Rish, I. Improving network robustness by edge modification. Phys. A Stat. Mechan. Appl. https://doi.org/10.1016/j.physa.2005.03.040 (2005).

doi: 10.1016/j.physa.2005.03.040

Zhang, G., Duan, H. & Zhou, J. Network stability, connectivity and innovation output. Technol. Forecast. Soc. Change https://doi.org/10.1016/j.techfore.2016.09.004 (2017).

doi: 10.1016/j.techfore.2016.09.004

Bellingeri, M. et al. Link and node removal in real social networks: A review. Front. Phys. https://doi.org/10.3389/fphy.2020.00228 (2020).

doi: 10.3389/fphy.2020.00228

Genkin, A., Lewis, D. D. & Madigan, D. Large-scale bayesian logistic regression for text categorization. Technometrics 49, 291–304 (2007).

doi: 10.1198/004017007000000245

Feng, X. et al. Overfitting reduction of text classification based on adabelm. Entropy 19, 330 (2017).

doi: 10.3390/e19070330

Deng, X., Li, Y., Weng, J. & Zhang, J. Feature selection for text classification: A review. Multimed. Tools Appl. 78, 3797–3816. https://doi.org/10.1007/s11042-018-6083-5 (2019).

doi: 10.1007/s11042-018-6083-5

Khurana, A. & Verma, O. P. Optimal feature selection for imbalanced text classification. IEEE Trans. Artif. Intell. 4, 135–147. https://doi.org/10.1109/TAI.2022.3144651 (2023).

doi: 10.1109/TAI.2022.3144651

Conroy, G. How chatgpt and other ai tools could disrupt scientific publishing. Nature 622, 234–236 (2023).

doi: 10.1038/d41586-023-03144-w pubmed: 37817033

Detection of ChatGPT fake science with the xFakeSci learning algorithm.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Ahmed Abdeen Hamed (AA)

Xindong Wu (X)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH