Research on Speech Synthesis Based on Mixture Alignment Mechanism.
acoustic signal processing
deep learning
mixture attention mechanism
speech synthesis
Journal
Sensors (Basel, Switzerland)
ISSN: 1424-8220
Titre abrégé: Sensors (Basel)
Pays: Switzerland
ID NLM: 101204366
Informations de publication
Date de publication:
20 Aug 2023
20 Aug 2023
Historique:
received:
07
07
2023
revised:
15
08
2023
accepted:
17
08
2023
medline:
28
8
2023
pubmed:
26
8
2023
entrez:
26
8
2023
Statut:
epublish
Résumé
In recent years, deep learning-based speech synthesis has attracted a lot of attention from the machine learning and speech communities. In this paper, we propose Mixture-TTS, a non-autoregressive speech synthesis model based on mixture alignment mechanism. Mixture-TTS aims to optimize the alignment information between text sequences and mel-spectrogram. Mixture-TTS uses a linguistic encoder based on soft phoneme-level alignment and hard word-level alignment approaches, which explicitly extract word-level semantic information, and introduce pitch and energy predictors to optimally predict the rhythmic information of the audio. Specifically, Mixture-TTS introduces a post-net based on a five-layer 1D convolution network to optimize the reconfiguration capability of the mel-spectrogram. We connect the output of the decoder to the post-net through the residual network. The mel-spectrogram is converted into the final audio by the HiFi-GAN vocoder. We evaluate the performance of the Mixture-TTS on the AISHELL3 and LJSpeech datasets. Experimental results show that Mixture-TTS is somewhat better in alignment information between the text sequences and mel-spectrogram, and is able to achieve high-quality audio. The ablation studies demonstrate that the structure of Mixture-TTS is effective.
Identifiants
pubmed: 37631819
pii: s23167283
doi: 10.3390/s23167283
pmc: PMC10457820
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : China State Shipbuilding Corporation (CSSC) Guangxi Shipbuilding and Offshore Engineering Technology Collaboration Project
ID : ZCGXJSB20226300222-06
Organisme : 100 Scholar Plan of the Guangxi Zhuang Autonomous Region of China
ID : 2018
Références
IEEE Trans Image Process. 2004 Apr;13(4):600-12
pubmed: 15376593
Sensors (Basel). 2022 Dec 20;23(1):
pubmed: 36616625
Entropy (Basel). 2022 Dec 26;25(1):
pubmed: 36673182