Unveiling scientific articles from paper mills with provenance analysis.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2024
Historique:
received: 09 06 2024
accepted: 09 10 2024
medline: 31 10 2024
pubmed: 30 10 2024
entrez: 30 10 2024
Statut: epublish

Résumé

The increasing prevalence of fake publications created by paper mills poses a significant challenge to maintaining scientific integrity. While integrity analysts typically rely on textual and visual clues to identify fake articles, determining which papers merit further investigation can be akin to searching for a needle in a haystack, as these fake publications have non-related authors and are published on non-related venues. To address this challenge, we developed a new methodology for provenance analysis, which automatically tracks and groups suspicious figures and documents. Our approach groups manuscripts from the same paper mill by analyzing their figures and identifying duplicated and manipulated regions. These regions are linked and organized in a provenance graph, providing evidence of systematic production. We tested our solution on a paper mill dataset of hundreds of documents and also on a larger version of the dataset that deliberately included thousands of documents intentionally selected to distract our method. Our approach successfully identified and linked systematically produced articles on both datasets by pinpointing the figures they reused and manipulated from one another. The technique herein proposed offers a promising solution to identify fraudulent manuscripts, and it could be a valuable tool for supporting scientific integrity.

Identifiants

pubmed: 39476003
doi: 10.1371/journal.pone.0312666
pii: PONE-D-24-23435
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0312666

Informations de copyright

Copyright: © 2024 Cardenuto et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Déclaration de conflit d'intérêts

No authors have competing interests.

Auteurs

João Phillipe Cardenuto (JP)

Artificial Intelligence Lab. Recod.ai, Institute of Computing, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil.

Daniel Moreira (D)

Department of Computer Science, Loyola University Chicago, Chicago, Illinois, United States of America.

Anderson Rocha (A)

Artificial Intelligence Lab. Recod.ai, Institute of Computing, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH