Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts.

Domain adaptation Fear prediction Hate speech Small dataset Text mining

Journal

Social network analysis and mining
ISSN: 1869-5450
Titre abrégé: Soc Netw Anal Min
Pays: Germany
ID NLM: 101616226

Informations de publication

Date de publication:
2021
Historique:
received: 08 11 2020
revised: 29 06 2021
accepted: 20 07 2021
entrez: 3 8 2021
pubmed: 4 8 2021
medline: 4 8 2021
Statut: ppublish

Résumé

In this world of information and experience era, microblogging sites have been commonly used to express people feelings including fear, panic, hate and abuse. Monitoring and control of abuse on social media, especially during pandemics such as COVID-19, can help in keeping the public sentiment and morale positive. Developing the fear and hate detection methods based on machine learning requires labelled data. However, obtaining the labelled data in suddenly changed circumstances as a pandemic is expensive and acquiring them in a short time is impractical. Related labelled hate data from other domains or previous incidents may be available. However, the predictive accuracy of these hate detection models decreases significantly if the data distribution of the target domain, where the prediction will be applied, is different. To address this problem, we propose a novel concept of unsupervised progressive domain adaptation based on a deep-learning language model generated through multiple text datasets. We showcase the efficacy of the proposed method in hate speech and fear detection on the tweets collection during COVID-19 where the labelled information is unavailable.

Identifiants

pubmed: 34341673
doi: 10.1007/s13278-021-00780-w
pii: 780
pmc: PMC8319196
doi:

Types de publication

Journal Article

Langues

eng

Pagination

69

Informations de copyright

© The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2021.

Références

J Biomed Inform. 2016 Aug;62:1-11
pubmed: 27224846
IEEE Trans Neural Netw. 2011 Feb;22(2):199-210
pubmed: 21095864
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
J Pers Soc Psychol. 2014 May;106(5):655-78
pubmed: 24749817
PLoS One. 2019 Aug 20;14(8):e0221152
pubmed: 31430308

Auteurs

Md Abul Bashar (MA)

School of Computer Science and Centre for Data Science, Queensland University of Technology, 2 George St, Brisbane City, QLD 4000 Australia.

Richi Nayak (R)

School of Computer Science and Centre for Data Science, Queensland University of Technology, 2 George St, Brisbane City, QLD 4000 Australia.

Khanh Luong (K)

School of Computer Science and Centre for Data Science, Queensland University of Technology, 2 George St, Brisbane City, QLD 4000 Australia.

Thirunavukarasu Balasubramaniam (T)

School of Computer Science and Centre for Data Science, Queensland University of Technology, 2 George St, Brisbane City, QLD 4000 Australia.

Classifications MeSH