Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.

cascaded DnCNN–CNN residual learning speech emotion recognition

Journal

Sensors (Basel, Switzerland)
ISSN: 1424-8220
Titre abrégé: Sensors (Basel)
Pays: Switzerland
ID NLM: 101204366

Informations de publication

Date de publication:
27 Jun 2021
Historique:
received: 25 05 2021
revised: 20 06 2021
accepted: 24 06 2021
entrez: 2 7 2021
pubmed: 3 7 2021
medline: 6 7 2021
Statut: epublish

Résumé

Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional speech denoising. In this study, a cascaded denoising CNN (DnCNN)-CNN architecture is proposed to classify emotions from Korean and German speech in noisy conditions. The proposed architecture consists of two stages. In the first stage, the DnCNN exploits the concept of residual learning to perform denoising; in the second stage, the CNN performs the classification. The classification results for real datasets show that the DnCNN-CNN outperforms the baseline CNN in overall accuracy for both languages. For Korean speech, the DnCNN-CNN achieves an accuracy of 95.8%, whereas the accuracy of the CNN is marginally lower (93.6%). For German speech, the DnCNN-CNN has an overall accuracy of 59.3-76.6%, whereas the CNN has an overall accuracy of 39.4-58.1%. These results demonstrate the feasibility of applying the DnCNN with residual learning to speech denoising and the effectiveness of the CNN-based approach in speech emotion recognition. Our findings provide new insights into speech emotion recognition in adverse conditions and have implications for language-universal speech emotion recognition.

Identifiants

pubmed: 34199027
pii: s21134399
doi: 10.3390/s21134399
pmc: PMC8271804
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : National Research Foundation of Korea
ID : 2017S1A6A3A01078538

Références

Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
Sensors (Basel). 2019 Jun 18;19(12):
pubmed: 31216650
IEEE Trans Image Process. 2017 Jul;26(7):3142-3155
pubmed: 28166495
Sensors (Basel). 2021 Feb 10;21(4):
pubmed: 33578714
Sensors (Basel). 2020 Sep 12;20(18):
pubmed: 32932723
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828
pubmed: 23787338

Auteurs

Youngja Nam (Y)

Humanities Research Institute, Chung-Ang University, Seoul 06974, Korea.

Chankyu Lee (C)

Humanities Research Institute, Chung-Ang University, Seoul 06974, Korea.
Department of Korean Language and Literature, Chung-Ang University, Seoul 06974, Korea.

Articles similaires

Humans Neoplasms Male Female Middle Aged

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH