ANAD: Arabic news article dataset.
Arabic news articles
Classification
Data analysis
Natural language processing (NLP)
Journal
Data in brief
ISSN: 2352-3409
Titre abrégé: Data Brief
Pays: Netherlands
ID NLM: 101654995
Informations de publication
Date de publication:
Oct 2023
Oct 2023
Historique:
received:
24
06
2023
revised:
21
07
2023
accepted:
24
07
2023
medline:
14
8
2023
pubmed:
14
8
2023
entrez:
14
8
2023
Statut:
epublish
Résumé
In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download at https://github.com/alaybaa/ArabicArticlesDataset/tree/main.
Identifiants
pubmed: 37577410
doi: 10.1016/j.dib.2023.109460
pii: S2352-3409(23)00560-7
pmc: PMC10415830
doi:
Types de publication
News
Langues
eng
Pagination
109460Informations de copyright
© 2023 The Author(s).