A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.

Artificial intelligence Health sciences Hoarse voice conversion Intelligibility Multidomain generative adversarial network Pathological voice

Journal

Journal of voice : official journal of the Voice Foundation

ISSN: 1873-4588

Titre abrégé: J Voice

Pays: United States

ID NLM: 8712262

Informations de publication

Date de publication:
14 Oct 2023

Historique:

received: 03 07 2023

revised: 30 08 2023

accepted: 30 08 2023

medline: 17 10 2023

pubmed: 17 10 2023

entrez: 16 10 2023

Statut: aheadofprint

Résumé

Hoarse voice affects the efficiency of communication between people. However, surgical treatment may result in patients with poorer voice quality, and voice repair techniques can only repair vowels. In this paper, we propose a novel multidomain generative adversarial voice conversion method to achieve hoarse-to-normal voice conversion and personalize voices for patients with hoarseness. The proposed method aims to improve the speech quality of hoarse voices through a multidomain generative adversarial network. The proposed method is evaluated on subjective and objective evaluation metrics. According to the findings of the spectrum analysis, the suggested method converts hoarse voice formants more effectively than variational auto-encoder (VAE), Auto-VC (voice conversion), StarGAN-VC (Generative Adversarial Network- Voice Conversion), and CycleVAE. For the word error rate, the suggested method obtains absolute gains of 35.62, 37.97, 45.42, and 50.05 compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. The suggested method achieves CycleVAE, VAE, StarGAN-VC, and Auto-VC, respectively, in terms of naturalness by 42.49%, 51.60%, 69.37%, and 77.54%. The suggested method outperforms VAE, CycleVAE, StarGAN-VC, and Auto-VC, respectively, in terms of intelligibility, with absolute gains of 0.87, 0.93, 1.08, and 1.13. In terms of content similarity, the proposed method obtains 43.48%, 75.52%, 76.21%, and 108.62% improvements compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. ABX results show that the suggested method can personalize the voice for patients with hoarseness. This study demonstrates the feasibility of voice conversion methods in improving the speech quality of hoarse voices.

Identifiants

DOI: 10.1016/j.jvoice.2023.08.027 PMID: 37845148

pubmed: 37845148

pii: S0892-1997(23)00274-6

doi: 10.1016/j.jvoice.2023.08.027

pii:

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Déclaration de conflit d'intérêts

Auteurs

Minghang Chu (M)

Jing Wang (J)

Zhiwei Fan (Z)

Mengtao Yang (M)

Chao Xu (C)

Yaoyao Ma (Y)

Zhi Tao (Z)

Di Wu (D)

Classifications MeSH