A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.

Artificial intelligence Health sciences Hoarse voice conversion Intelligibility Multidomain generative adversarial network Pathological voice

Journal

Journal of voice : official journal of the Voice Foundation
ISSN: 1873-4588
Titre abrégé: J Voice
Pays: United States
ID NLM: 8712262

Informations de publication

Date de publication:
14 Oct 2023
Historique:
received: 03 07 2023
revised: 30 08 2023
accepted: 30 08 2023
medline: 17 10 2023
pubmed: 17 10 2023
entrez: 16 10 2023
Statut: aheadofprint

Résumé

Hoarse voice affects the efficiency of communication between people. However, surgical treatment may result in patients with poorer voice quality, and voice repair techniques can only repair vowels. In this paper, we propose a novel multidomain generative adversarial voice conversion method to achieve hoarse-to-normal voice conversion and personalize voices for patients with hoarseness. The proposed method aims to improve the speech quality of hoarse voices through a multidomain generative adversarial network. The proposed method is evaluated on subjective and objective evaluation metrics. According to the findings of the spectrum analysis, the suggested method converts hoarse voice formants more effectively than variational auto-encoder (VAE), Auto-VC (voice conversion), StarGAN-VC (Generative Adversarial Network- Voice Conversion), and CycleVAE. For the word error rate, the suggested method obtains absolute gains of 35.62, 37.97, 45.42, and 50.05 compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. The suggested method achieves CycleVAE, VAE, StarGAN-VC, and Auto-VC, respectively, in terms of naturalness by 42.49%, 51.60%, 69.37%, and 77.54%. The suggested method outperforms VAE, CycleVAE, StarGAN-VC, and Auto-VC, respectively, in terms of intelligibility, with absolute gains of 0.87, 0.93, 1.08, and 1.13. In terms of content similarity, the proposed method obtains 43.48%, 75.52%, 76.21%, and 108.62% improvements compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. ABX results show that the suggested method can personalize the voice for patients with hoarseness. This study demonstrates the feasibility of voice conversion methods in improving the speech quality of hoarse voices.

Identifiants

pubmed: 37845148
pii: S0892-1997(23)00274-6
doi: 10.1016/j.jvoice.2023.08.027
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

Copyright © 2023 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Minghang Chu (M)

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

Jing Wang (J)

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

Zhiwei Fan (Z)

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

Mengtao Yang (M)

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

Chao Xu (C)

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

Yaoyao Ma (Y)

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

Zhi Tao (Z)

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

Di Wu (D)

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China. Electronic address: wudi@suda.edu.cn.

Classifications MeSH