Big data directed acyclic graph model for real-time COVID-19 twitter stream detection.

Anomaly detection Big data COVID-19 Directed acyclic graph Event stream

Journal

Pattern recognition
ISSN: 0031-3203
Titre abrégé: Pattern Recognit
Pays: England
ID NLM: 0250655

Informations de publication

Date de publication:
Mar 2022
Historique:
received: 28 03 2021
revised: 15 09 2021
accepted: 24 10 2021
pubmed: 9 11 2021
medline: 9 11 2021
entrez: 8 11 2021
Statut: ppublish

Résumé

Every day, large-scale data are continuously generated on social media as streams, such as Twitter, which inform us about all events around the world in real-time. Notably, Twitter is one of the effective platforms to update countries leaders and scientists during the coronavirus (COVID-19) pandemic. Other people have also used this platform to post their concerns about the spread of this virus and a rapid increase of death cases globally. The aim of this work is to detect anomalous events associated with COVID-19 from Twitter. To this end, we propose a distributed Directed Acyclic Graph topology framework to aggregate and process large-scale real-time tweets related to COVID-19. The core of our system is a novel lightweight algorithm that can automatically detect anomaly events. In addition, our system can also identify, cluster, and visualize important keywords in tweets. On 18 August 2020, our model detected the highest anomaly since many tweets mentioned the casualties' updates and the debates on the pandemic that day. We obtained the three most commonly listed terms on Twitter: "covid", "death", and "Trump" (21,566, 11,779, and 4761 occurrences, respectively), with the highest TF-IDF score for these terms: "people" (0.63637), "school" (0.5921407) and "virus" (0.57385). From our clustering result, the word "death", "corona", and "case" are grouped into one cluster, where the word "pandemic", "school", and "president" are grouped as another cluster. These terms were located near each other on vector space so that they were clustered, indicating people's most concerned topics on Twitter.

Identifiants

pubmed: 34744186
doi: 10.1016/j.patcog.2021.108404
pii: S0031-3203(21)00580-X
pmc: PMC8556703
doi:

Types de publication

Journal Article

Langues

eng

Pagination

108404

Informations de copyright

© 2021 Elsevier Ltd. All rights reserved.

Déclaration de conflit d'intérêts

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Références

Indian J Orthop. 2020 May 7;54(4):526-528
pubmed: 32382166
Int J Environ Res Public Health. 2020 Jul 24;17(15):
pubmed: 32722154

Auteurs

Bakhtiar Amen (B)

Department of Computer Science, School of Electrical Engineering, Electronics, and Computer Science, University of Liverpool, Liverpool L69 3BX, UK.

Syahirul Faiz (S)

State Islamic Institute of Surakarta (IAIN Surakarta), Indonesia.

Thanh-Toan Do (TT)

Department of Data Science and AI, Faculty of Information Technology, Monash University, Australia.

Classifications MeSH