Research on Speech Synthesis Based on Mixture Alignment Mechanism.

acoustic signal processing deep learning mixture attention mechanism speech synthesis

Journal

Sensors (Basel, Switzerland)
ISSN: 1424-8220
Titre abrégé: Sensors (Basel)
Pays: Switzerland
ID NLM: 101204366

Informations de publication

Date de publication:
20 Aug 2023
Historique:
received: 07 07 2023
revised: 15 08 2023
accepted: 17 08 2023
medline: 28 8 2023
pubmed: 26 8 2023
entrez: 26 8 2023
Statut: epublish

Résumé

In recent years, deep learning-based speech synthesis has attracted a lot of attention from the machine learning and speech communities. In this paper, we propose Mixture-TTS, a non-autoregressive speech synthesis model based on mixture alignment mechanism. Mixture-TTS aims to optimize the alignment information between text sequences and mel-spectrogram. Mixture-TTS uses a linguistic encoder based on soft phoneme-level alignment and hard word-level alignment approaches, which explicitly extract word-level semantic information, and introduce pitch and energy predictors to optimally predict the rhythmic information of the audio. Specifically, Mixture-TTS introduces a post-net based on a five-layer 1D convolution network to optimize the reconfiguration capability of the mel-spectrogram. We connect the output of the decoder to the post-net through the residual network. The mel-spectrogram is converted into the final audio by the HiFi-GAN vocoder. We evaluate the performance of the Mixture-TTS on the AISHELL3 and LJSpeech datasets. Experimental results show that Mixture-TTS is somewhat better in alignment information between the text sequences and mel-spectrogram, and is able to achieve high-quality audio. The ablation studies demonstrate that the structure of Mixture-TTS is effective.

Identifiants

pubmed: 37631819
pii: s23167283
doi: 10.3390/s23167283
pmc: PMC10457820
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : China State Shipbuilding Corporation (CSSC) Guangxi Shipbuilding and Offshore Engineering Technology Collaboration Project
ID : ZCGXJSB20226300222-06
Organisme : 100 Scholar Plan of the Guangxi Zhuang Autonomous Region of China
ID : 2018

Références

IEEE Trans Image Process. 2004 Apr;13(4):600-12
pubmed: 15376593
Sensors (Basel). 2022 Dec 20;23(1):
pubmed: 36616625
Entropy (Basel). 2022 Dec 26;25(1):
pubmed: 36673182

Auteurs

Yan Deng (Y)

School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China.

Ning Wu (N)

Key Laboratory of Beibu Gulf Offshore Engineering Equipment and Technology, Beibu Gulf University, Qinzhou 535011, China.

Chengjun Qiu (C)

College of Mechanical Naval Architecture and Ocean Engineering, Beibu Gulf University, Qinzhou 535011, China.
Guangxi Key Laboratory of Ocean Engineering Equipment and Technology, Qinzhou 535011, China.

Yan Chen (Y)

School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China.

Xueshan Gao (X)

College of Mechanical Naval Architecture and Ocean Engineering, Beibu Gulf University, Qinzhou 535011, China.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Humans Artificial Intelligence Neoplasms Prognosis Image Processing, Computer-Assisted

Classifications MeSH