Mixed Script Identification Using Automated DNN Hyperparameter Optimization.
Journal
Computational intelligence and neuroscience
ISSN: 1687-5273
Titre abrégé: Comput Intell Neurosci
Pays: United States
ID NLM: 101279357
Informations de publication
Date de publication:
2021
2021
Historique:
received:
03
10
2021
revised:
30
10
2021
accepted:
05
11
2021
entrez:
20
12
2021
pubmed:
21
12
2021
medline:
22
12
2021
Statut:
epublish
Résumé
Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing.
Identifiants
pubmed: 34925496
doi: 10.1155/2021/8415333
pmc: PMC8683192
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
8415333Informations de copyright
Copyright © 2021 Muhammad Yasir et al.
Déclaration de conflit d'intérêts
There are no conflicts of interest for the publication of this research among all the scholars who participated in this study.
Références
Comput Intell Neurosci. 2021 Oct 19;2021:2195922
pubmed: 34712316
Neural Netw. 2005 Jun-Jul;18(5-6):602-10
pubmed: 16112549
J Adv Res. 2020 Apr 26;25:87-96
pubmed: 32922977
IEEE Trans Pattern Anal Mach Intell. 2009 May;31(5):855-68
pubmed: 19299860
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276