A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.
Artificial intelligence
Health sciences
Hoarse voice conversion
Intelligibility
Multidomain generative adversarial network
Pathological voice
Journal
Journal of voice : official journal of the Voice Foundation
ISSN: 1873-4588
Titre abrégé: J Voice
Pays: United States
ID NLM: 8712262
Informations de publication
Date de publication:
14 Oct 2023
14 Oct 2023
Historique:
received:
03
07
2023
revised:
30
08
2023
accepted:
30
08
2023
medline:
17
10
2023
pubmed:
17
10
2023
entrez:
16
10
2023
Statut:
aheadofprint
Résumé
Hoarse voice affects the efficiency of communication between people. However, surgical treatment may result in patients with poorer voice quality, and voice repair techniques can only repair vowels. In this paper, we propose a novel multidomain generative adversarial voice conversion method to achieve hoarse-to-normal voice conversion and personalize voices for patients with hoarseness. The proposed method aims to improve the speech quality of hoarse voices through a multidomain generative adversarial network. The proposed method is evaluated on subjective and objective evaluation metrics. According to the findings of the spectrum analysis, the suggested method converts hoarse voice formants more effectively than variational auto-encoder (VAE), Auto-VC (voice conversion), StarGAN-VC (Generative Adversarial Network- Voice Conversion), and CycleVAE. For the word error rate, the suggested method obtains absolute gains of 35.62, 37.97, 45.42, and 50.05 compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. The suggested method achieves CycleVAE, VAE, StarGAN-VC, and Auto-VC, respectively, in terms of naturalness by 42.49%, 51.60%, 69.37%, and 77.54%. The suggested method outperforms VAE, CycleVAE, StarGAN-VC, and Auto-VC, respectively, in terms of intelligibility, with absolute gains of 0.87, 0.93, 1.08, and 1.13. In terms of content similarity, the proposed method obtains 43.48%, 75.52%, 76.21%, and 108.62% improvements compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. ABX results show that the suggested method can personalize the voice for patients with hoarseness. This study demonstrates the feasibility of voice conversion methods in improving the speech quality of hoarse voices.
Identifiants
pubmed: 37845148
pii: S0892-1997(23)00274-6
doi: 10.1016/j.jvoice.2023.08.027
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
Copyright © 2023 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.