Dataset for detecting and characterizing Arab computation propaganda on X.
Arabic computation propaganda
Disinformation
Propaganda classification
Propagandists’ characteristics
Social media
Journal
Data in brief
ISSN: 2352-3409
Titre abrégé: Data Brief
Pays: Netherlands
ID NLM: 101654995
Informations de publication
Date de publication:
Apr 2024
Apr 2024
Historique:
received:
02
10
2023
revised:
15
01
2024
accepted:
16
01
2024
medline:
8
2
2024
pubmed:
8
2
2024
entrez:
8
2
2024
Statut:
epublish
Résumé
Arab nations are greatly influenced by computational propaganda. Detecting Arab computational propaganda has become a trending topic in social media research. Despite all the efforts made, the definitive definition of a propagandistic characteristic is still not clear. Additionally, the earlier datasets were acquired and labelled for a specific study but were neglected thereafter. As a result, researchers are unable to assess whether the proposed AI detectors can be generalized or not. There is a lack of real ground truth, either to characterize Arab propagandist behaviours or evaluate the new proposed detectors. The provided dataset aims to demonstrate the value of characterizing Arab computational propaganda on X (Twitter) to close the research gap. It is prepared using a scientific approach to guarantee data quality. To ensure the quality of the data, the propagandist users' data was requested from the X Transparency center. Although the data released by X relates to propagandist users, at their level, the tweets were not classified as propaganda or not. Usually, propagandists mix propaganda and non-propaganda tweets to hide their identities. Therefore, three journalist volunteers were employed to label 2100 tweets for either propaganda or not and then label the propagandist tweet according to the propaganda technique used. The dataset covers sports and banking issues. As a result, the dataset consists of 16,355,558 tweets with their meta data from propagandist users in 2019. Plus, 2100 propagandists labelled tweets. The propagandist's dataset helps the research community apply supervised and unsupervised machine learning and deep learning algorithms to classify the credibility of Arab tweets and users. On the other hand, this paper suggests looking at behaviour rather than content to distinguish propaganda communication. The datasets enable deep non-textual analysis to investigate the main characteristics of Arab computational propaganda on X.
Identifiants
pubmed: 38328292
doi: 10.1016/j.dib.2024.110089
pii: S2352-3409(24)00062-3
pmc: PMC10847467
doi:
Types de publication
Journal Article
Langues
eng
Pagination
110089Informations de copyright
© 2024 The Author(s).