Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection.
language understanding
robot-directed speech detection
semantic parsing
service robot
speech recognition
Journal
Frontiers in robotics and AI
ISSN: 2296-9144
Titre abrégé: Front Robot AI
Pays: Switzerland
ID NLM: 101749350
Informations de publication
Date de publication:
2019
2019
Historique:
received:
25
07
2019
accepted:
09
12
2019
entrez:
27
1
2021
pubmed:
28
1
2021
medline:
28
1
2021
Statut:
epublish
Résumé
This paper describes a new method that enables a service robot to understand spoken commands in a robust manner using off-the-shelf automatic speech recognition (ASR) systems and an encoder-decoder neural network with noise injection. In numerous instances, the understanding of spoken commands in the area of service robotics is modeled as a mapping of speech signals to a sequence of commands that can be understood and performed by a robot. In a conventional approach, speech signals are recognized, and semantic parsing is applied to infer the command sequence from the utterance. However, if errors occur during the process of speech recognition, a conventional semantic parsing method cannot be appropriately applied because most natural language processing methods do not recognize such errors. We propose the use of encoder-decoder neural networks, e.g., sequence to sequence, with noise injection. The noise is injected into phoneme sequences during the training phase of encoder-decoder neural network-based semantic parsing systems. We demonstrate that the use of neural networks with a noise injection can mitigate the negative effects of speech recognition errors in understanding robot-directed speech commands i.e., increase the performance of semantic parsing. We implemented the method and evaluated it using the commands given during a general purpose service robot (GPSR) task, such as a task applied in RoboCup@Home, which is a standard service robot competition for the testing of service robots. The results of the experiment show that the proposed method, namely, sequence to sequence with noise injection (Seq2Seq-NI), outperforms the baseline methods. In addition, Seq2Seq-NI enables a robot to understand a spoken command even when the speech recognition by an off-the-shelf ASR system contains recognition errors. Moreover, in this paper we describe an experiment conducted to evaluate the influence of the injected noise and provide a discussion of the results.
Identifiants
pubmed: 33501159
doi: 10.3389/frobt.2019.00144
pmc: PMC7805724
doi:
Types de publication
Journal Article
Langues
eng
Pagination
144Informations de copyright
Copyright © 2020 Tada, Hagiwara, Tanaka and Taniguchi.
Références
Med Phys. 2009 Oct;36(10):4810-8
pubmed: 19928111
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276