Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

automatic speech recognition (ASR) gain control human–computer interaction maximized original signal transmission (MOST) noise figure word error rate (WER)

Journal

Sensors (Basel, Switzerland)
ISSN: 1424-8220
Titre abrégé: Sensors (Basel)
Pays: Switzerland
ID NLM: 101204366

Informations de publication

Date de publication:
15 Apr 2022
Historique:
received: 24 03 2022
revised: 09 04 2022
accepted: 12 04 2022
entrez: 23 4 2022
pubmed: 24 4 2022
medline: 27 4 2022
Statut: epublish

Résumé

Automatic speech recognition (ASR) is an essential technique of human-computer interactions; gain control is a commonly used operation in ASR. However, inappropriate gain control strategies can lead to an increase in the word error rate (WER) of ASR. As there is a current lack of sufficient theoretical analyses and proof of the relationship between gain control and WER, various unconstrained gain control strategies have been adopted on realistic ASR systems, and the optimal gain control with respect to the lowest WER, is rarely achieved. A gain control strategy named maximized original signal transmission (MOST) is proposed in this study to minimize the adverse impact of gain control on ASR systems. First, by modeling the gain control strategy, the quantitative relationship between the gain control strategy and the ASR performance was established using the noise figure index. Second, through an analysis of the quantitative relationship, an optimal MOST gain control strategy with minimal performance degradation was theoretically deduced. Finally, comprehensive comparative experiments on a Mandarin dataset show that the proposed MOST gain control strategy can significantly reduce the WER of the experimental ASR system, with a 10% mean absolute WER reduction at -9 dB gain.

Identifiants

pubmed: 35459013
pii: s22083027
doi: 10.3390/s22083027
pmc: PMC9027119
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : National Natural Science Foundation of China
ID : 61973059

Références

IEEE Trans Neural Syst Rehabil Eng. 2014 Sep;22(5):1053-63
pubmed: 24760940
Sensors (Basel). 2020 Jul 24;20(15):
pubmed: 32722095

Auteurs

Desheng Wang (D)

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China.

Yangjie Wei (Y)

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China.

Ke Zhang (K)

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China.

Dong Ji (D)

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China.

Yi Wang (Y)

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH