Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.

Emotions Language Neural Networks, Computer Perception Speech

cascaded DnCNN–CNN residual learning speech emotion recognition

Journal

Sensors (Basel, Switzerland)

ISSN: 1424-8220

Titre abrégé: Sensors (Basel)

Pays: Switzerland

ID NLM: 101204366

Informations de publication

Date de publication:
27 Jun 2021

Historique:

received: 25 05 2021

revised: 20 06 2021

accepted: 24 06 2021

entrez: 2 7 2021

pubmed: 3 7 2021

medline: 6 7 2021

Statut: epublish

Résumé

Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional speech denoising. In this study, a cascaded denoising CNN (DnCNN)-CNN architecture is proposed to classify emotions from Korean and German speech in noisy conditions. The proposed architecture consists of two stages. In the first stage, the DnCNN exploits the concept of residual learning to perform denoising; in the second stage, the CNN performs the classification. The classification results for real datasets show that the DnCNN-CNN outperforms the baseline CNN in overall accuracy for both languages. For Korean speech, the DnCNN-CNN achieves an accuracy of 95.8%, whereas the accuracy of the CNN is marginally lower (93.6%). For German speech, the DnCNN-CNN has an overall accuracy of 59.3-76.6%, whereas the CNN has an overall accuracy of 39.4-58.1%. These results demonstrate the feasibility of applying the DnCNN with residual learning to speech denoising and the effectiveness of the CNN-based approach in speech emotion recognition. Our findings provide new insights into speech emotion recognition in adverse conditions and have implications for language-universal speech emotion recognition.

Identifiants

DOI: 10.3390/s21134399 PMID: 34199027 PMC: PMC8271804

pubmed: 34199027

pii: s21134399

doi: 10.3390/s21134399

pmc: PMC8271804

pii:

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : National Research Foundation of Korea

ID : 2017S1A6A3A01078538

Références

Nature. 2015 May 28;521(7553):436-44

pubmed: 26017442

Sensors (Basel). 2019 Jun 18;19(12):

pubmed: 31216650

IEEE Trans Image Process. 2017 Jul;26(7):3142-3155

pubmed: 28166495

Sensors (Basel). 2021 Feb 10;21(4):

pubmed: 33578714

Sensors (Basel). 2020 Sep 12;20(18):

pubmed: 32932723

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828

pubmed: 23787338

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Subventions

Références

Auteurs

Youngja Nam (Y)

Chankyu Lee (C)

Articles similaires

Perceived risk of death among patients with advanced cancer: a qualitative directed content analysis.

How Do Personal Attributes Shape AI Dependency in Chinese Higher Education Context? Insights from Needs Frustration Perspective.

The Importance of Language and Messaging in Psychological Treatment for Functional Neurological Disorder in Children and Adolescents.

Unsupervised learning for real-time and continuous gait phase detection.

Classifications MeSH