Skeleton-Based Emotion Recognition Based on Two-Stream Self-Attention Enhanced Spatial-Temporal Graph Convolutional Network.

emotion recognition gesture graph convolutional networks self-attention skeleton

Journal

Sensors (Basel, Switzerland)
ISSN: 1424-8220
Titre abrégé: Sensors (Basel)
Pays: Switzerland
ID NLM: 101204366

Informations de publication

Date de publication:
30 Dec 2020
Historique:
received: 30 11 2020
revised: 24 12 2020
accepted: 27 12 2020
entrez: 5 1 2021
pubmed: 6 1 2021
medline: 6 1 2021
Statut: epublish

Résumé

Emotion recognition has drawn consistent attention from researchers recently. Although gesture modality plays an important role in expressing emotion, it is seldom considered in the field of emotion recognition. A key reason is the scarcity of labeled data containing 3D skeleton data. Some studies in action recognition have applied graph-based neural networks to explicitly model the spatial connection between joints. However, this method has not been considered in the field of gesture-based emotion recognition, so far. In this work, we applied a pose estimation based method to extract 3D skeleton coordinates for IEMOCAP database. We propose a self-attention enhanced spatial temporal graph convolutional network for skeleton-based emotion recognition, in which the spatial convolutional part models the skeletal structure of the body as a static graph, and the self-attention part dynamically constructs more connections between the joints and provides supplementary information. Our experiment demonstrates that the proposed model significantly outperforms other models and that the features of the extracted skeleton data improve the performance of multimodal emotion recognition.

Identifiants

pubmed: 33396917
pii: s21010205
doi: 10.3390/s21010205
pmc: PMC7795329
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Grant-in-Aid for Scientific Research on Innovative Areas
ID : JP20H05576
Organisme : JST ERATO
ID : JPMJER1401

Références

Perception. 2013;42(6):642-57
pubmed: 24422246
IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):4-24
pubmed: 32217482
IEEE Trans Image Process. 2020 Oct 09;PP:
pubmed: 33035162
Entropy (Basel). 2019 Jun 29;21(7):
pubmed: 33267360

Auteurs

Jiaqi Shi (J)

Graduate School of Engineering Science, Osaka University, Osaka 565-0871, Japan.
Advanced Telecommunications Research Institute International, Kyoto 619-0237, Japan.

Chaoran Liu (C)

Advanced Telecommunications Research Institute International, Kyoto 619-0237, Japan.

Carlos Toshinori Ishi (CT)

Advanced Telecommunications Research Institute International, Kyoto 619-0237, Japan.
Interactive Robot Research Team, Robotics Project, RIKEN, Kyoto 351-0198, Japan.

Hiroshi Ishiguro (H)

Graduate School of Engineering Science, Osaka University, Osaka 565-0871, Japan.
Advanced Telecommunications Research Institute International, Kyoto 619-0237, Japan.

Classifications MeSH