Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.
cascaded DnCNN–CNN
residual learning
speech emotion recognition
Journal
Sensors (Basel, Switzerland)
ISSN: 1424-8220
Titre abrégé: Sensors (Basel)
Pays: Switzerland
ID NLM: 101204366
Informations de publication
Date de publication:
27 Jun 2021
27 Jun 2021
Historique:
received:
25
05
2021
revised:
20
06
2021
accepted:
24
06
2021
entrez:
2
7
2021
pubmed:
3
7
2021
medline:
6
7
2021
Statut:
epublish
Résumé
Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional speech denoising. In this study, a cascaded denoising CNN (DnCNN)-CNN architecture is proposed to classify emotions from Korean and German speech in noisy conditions. The proposed architecture consists of two stages. In the first stage, the DnCNN exploits the concept of residual learning to perform denoising; in the second stage, the CNN performs the classification. The classification results for real datasets show that the DnCNN-CNN outperforms the baseline CNN in overall accuracy for both languages. For Korean speech, the DnCNN-CNN achieves an accuracy of 95.8%, whereas the accuracy of the CNN is marginally lower (93.6%). For German speech, the DnCNN-CNN has an overall accuracy of 59.3-76.6%, whereas the CNN has an overall accuracy of 39.4-58.1%. These results demonstrate the feasibility of applying the DnCNN with residual learning to speech denoising and the effectiveness of the CNN-based approach in speech emotion recognition. Our findings provide new insights into speech emotion recognition in adverse conditions and have implications for language-universal speech emotion recognition.
Identifiants
pubmed: 34199027
pii: s21134399
doi: 10.3390/s21134399
pmc: PMC8271804
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : National Research Foundation of Korea
ID : 2017S1A6A3A01078538
Références
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
Sensors (Basel). 2019 Jun 18;19(12):
pubmed: 31216650
IEEE Trans Image Process. 2017 Jul;26(7):3142-3155
pubmed: 28166495
Sensors (Basel). 2021 Feb 10;21(4):
pubmed: 33578714
Sensors (Basel). 2020 Sep 12;20(18):
pubmed: 32932723
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828
pubmed: 23787338