Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection.

language understanding robot-directed speech detection semantic parsing service robot speech recognition

Journal

Frontiers in robotics and AI
ISSN: 2296-9144
Titre abrégé: Front Robot AI
Pays: Switzerland
ID NLM: 101749350

Informations de publication

Date de publication:
2019
Historique:
received: 25 07 2019
accepted: 09 12 2019
entrez: 27 1 2021
pubmed: 28 1 2021
medline: 28 1 2021
Statut: epublish

Résumé

This paper describes a new method that enables a service robot to understand spoken commands in a robust manner using off-the-shelf automatic speech recognition (ASR) systems and an encoder-decoder neural network with noise injection. In numerous instances, the understanding of spoken commands in the area of service robotics is modeled as a mapping of speech signals to a sequence of commands that can be understood and performed by a robot. In a conventional approach, speech signals are recognized, and semantic parsing is applied to infer the command sequence from the utterance. However, if errors occur during the process of speech recognition, a conventional semantic parsing method cannot be appropriately applied because most natural language processing methods do not recognize such errors. We propose the use of encoder-decoder neural networks, e.g., sequence to sequence, with noise injection. The noise is injected into phoneme sequences during the training phase of encoder-decoder neural network-based semantic parsing systems. We demonstrate that the use of neural networks with a noise injection can mitigate the negative effects of speech recognition errors in understanding robot-directed speech commands i.e., increase the performance of semantic parsing. We implemented the method and evaluated it using the commands given during a general purpose service robot (GPSR) task, such as a task applied in RoboCup@Home, which is a standard service robot competition for the testing of service robots. The results of the experiment show that the proposed method, namely, sequence to sequence with noise injection (Seq2Seq-NI), outperforms the baseline methods. In addition, Seq2Seq-NI enables a robot to understand a spoken command even when the speech recognition by an off-the-shelf ASR system contains recognition errors. Moreover, in this paper we describe an experiment conducted to evaluate the influence of the injected noise and provide a discussion of the results.

Identifiants

pubmed: 33501159
doi: 10.3389/frobt.2019.00144
pmc: PMC7805724
doi:

Types de publication

Journal Article

Langues

eng

Pagination

144

Informations de copyright

Copyright © 2020 Tada, Hagiwara, Tanaka and Taniguchi.

Références

Med Phys. 2009 Oct;36(10):4810-8
pubmed: 19928111
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276

Auteurs

Yuuki Tada (Y)

Emergent Systems Laboratory, College of Information Science and Engineering, Ritsumeikan University, Shiga, Japan.

Yoshinobu Hagiwara (Y)

Emergent Systems Laboratory, College of Information Science and Engineering, Ritsumeikan University, Shiga, Japan.

Hiroki Tanaka (H)

Emergent Systems Laboratory, College of Information Science and Engineering, Ritsumeikan University, Shiga, Japan.

Tadahiro Taniguchi (T)

Emergent Systems Laboratory, College of Information Science and Engineering, Ritsumeikan University, Shiga, Japan.

Classifications MeSH