Dataset for detecting and characterizing Arab computation propaganda on X.

Arabic computation propaganda Disinformation Propaganda classification Propagandists’ characteristics Social media

Journal

Data in brief

ISSN: 2352-3409

Titre abrégé: Data Brief

Pays: Netherlands

ID NLM: 101654995

Informations de publication

Date de publication:
Apr 2024

Historique:

received: 02 10 2023

revised: 15 01 2024

accepted: 16 01 2024

medline: 8 2 2024

pubmed: 8 2 2024

entrez: 8 2 2024

Statut: epublish

Résumé

Arab nations are greatly influenced by computational propaganda. Detecting Arab computational propaganda has become a trending topic in social media research. Despite all the efforts made, the definitive definition of a propagandistic characteristic is still not clear. Additionally, the earlier datasets were acquired and labelled for a specific study but were neglected thereafter. As a result, researchers are unable to assess whether the proposed AI detectors can be generalized or not. There is a lack of real ground truth, either to characterize Arab propagandist behaviours or evaluate the new proposed detectors. The provided dataset aims to demonstrate the value of characterizing Arab computational propaganda on X (Twitter) to close the research gap. It is prepared using a scientific approach to guarantee data quality. To ensure the quality of the data, the propagandist users' data was requested from the X Transparency center. Although the data released by X relates to propagandist users, at their level, the tweets were not classified as propaganda or not. Usually, propagandists mix propaganda and non-propaganda tweets to hide their identities. Therefore, three journalist volunteers were employed to label 2100 tweets for either propaganda or not and then label the propagandist tweet according to the propaganda technique used. The dataset covers sports and banking issues. As a result, the dataset consists of 16,355,558 tweets with their meta data from propagandist users in 2019. Plus, 2100 propagandists labelled tweets. The propagandist's dataset helps the research community apply supervised and unsupervised machine learning and deep learning algorithms to classify the credibility of Arab tweets and users. On the other hand, this paper suggests looking at behaviour rather than content to distinguish propaganda communication. The datasets enable deep non-textual analysis to investigate the main characteristics of Arab computational propaganda on X.

Identifiants

DOI: 10.1016/j.dib.2024.110089 PMID: 38328292 PMC: PMC10847467

pubmed: 38328292

doi: 10.1016/j.dib.2024.110089

pii: S2352-3409(24)00062-3

pmc: PMC10847467

doi:

Types de publication

Journal Article

Langues

eng

Pagination

110089

Dataset for detecting and characterizing Arab computation propaganda on X.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Pagination

Informations de copyright

Auteurs

Bodor Moheel Almotairy (BM)

Manal Abdullah (M)

Dimah Hussein Alahmadi (DH)

Classifications MeSH