Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech.

Dimensional emotion Multi-resolution modulation-filtered cochleagram Parallel long short-term memory network Temporal modulation

Journal

Neural networks : the official journal of the International Neural Network Society
ISSN: 1879-2782
Titre abrégé: Neural Netw
Pays: United States
ID NLM: 8805018

Informations de publication

Date de publication:
Aug 2021
Historique:
received: 07 07 2020
revised: 11 02 2021
accepted: 15 03 2021
pubmed: 11 4 2021
medline: 29 6 2021
entrez: 10 4 2021
Statut: ppublish

Résumé

Continuous dimensional emotion recognition from speech helps robots or virtual agents capture the temporal dynamics of a speaker's emotional state in natural human-robot interactions. Temporal modulation cues obtained directly from the time-domain model of auditory perception can better reflect temporal dynamics than the acoustic features usually processed in the frequency domain. Feature extraction, which can reflect temporal dynamics of emotion from temporal modulation cues, is challenging because of the complexity and diversity of the auditory perception model. A recent neuroscientific study suggests that human brains derive multi-resolution representations through temporal modulation analysis. This study investigates multi-resolution representations of an auditory perception model and proposes a novel feature called multi-resolution modulation-filtered cochleagram (MMCG) for predicting valence and arousal values of emotional primitives. The MMCG is constructed by combining four modulation-filtered cochleagrams at different resolutions to capture various temporal and contextual modulation information. In addition, to model the multi-temporal dependencies of the MMCG, we designed a parallel long short-term memory (LSTM) architecture. The results of extensive experiments on the RECOLA and SEWA datasets demonstrate that MMCG provides the best recognition performance in both datasets among all evaluated features. The results also show that the parallel LSTM can build multi-temporal dependencies from the MMCG features, and the performance on valence and arousal prediction is better than that of a plain LSTM method.

Identifiants

pubmed: 33838592
pii: S0893-6080(21)00115-5
doi: 10.1016/j.neunet.2021.03.027
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

261-273

Informations de copyright

Copyright © 2021. Published by Elsevier Ltd.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Zhichao Peng (Z)

Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin 300050, China; Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan. Electronic address: zcpeng@jaist.ac.jp.

Jianwu Dang (J)

Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin 300050, China; Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan; Pengcheng Laboratory, Shenzhen 518055, China. Electronic address: jdang@jaist.ac.jp.

Masashi Unoki (M)

Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan. Electronic address: unoki@jaist.ac.jp.

Masato Akagi (M)

Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan. Electronic address: akagi@jaist.ac.jp.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH