ANAD: Arabic news article dataset.

Arabic news articles Classification Data analysis Natural language processing (NLP)

Journal

Data in brief
ISSN: 2352-3409
Titre abrégé: Data Brief
Pays: Netherlands
ID NLM: 101654995

Informations de publication

Date de publication:
Oct 2023
Historique:
received: 24 06 2023
revised: 21 07 2023
accepted: 24 07 2023
medline: 14 8 2023
pubmed: 14 8 2023
entrez: 14 8 2023
Statut: epublish

Résumé

In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download at https://github.com/alaybaa/ArabicArticlesDataset/tree/main.

Identifiants

pubmed: 37577410
doi: 10.1016/j.dib.2023.109460
pii: S2352-3409(23)00560-7
pmc: PMC10415830
doi:

Types de publication

News

Langues

eng

Pagination

109460

Informations de copyright

© 2023 The Author(s).

Auteurs

Mohammed Altamimi (M)

Department of Information and Computer Science, College of Computer Science and Engineering, University of Ha'il, Ha'il, 81481, Saudi Arabia.

Abdulaziz M Alayba (AM)

Department of Information and Computer Science, College of Computer Science and Engineering, University of Ha'il, Ha'il, 81481, Saudi Arabia.

Classifications MeSH