Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator.


Journal

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention
Titre abrégé: Med Image Comput Comput Assist Interv
Pays: Germany
ID NLM: 101249582

Informations de publication

Date de publication:
Sep 2022
Historique:
entrez: 23 2 2023
pubmed: 24 2 2023
medline: 24 2 2023
Statut: ppublish

Résumé

Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities-i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform-is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size. Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual attention strategy to specifically exploit the moving muscular structures during speech. In addition, we leverage a pairwise correlation of the samples with the same utterances with a latent space representation disentanglement strategy. Furthermore, we incorporate an adversarial training approach with generative adversarial networks to offer improved realism on our generated spectrograms. Our experimental results, carried out with a total of 63 tagged-MRI sequences alongside speech acoustics, showed that our framework enabled the generation of clear audio waveforms from a sequence of tagged-MRI, surpassing competing methods. Thus, our framework provides the great potential to help better understand the relationship between the two modalities.

Identifiants

pubmed: 36820764
doi: 10.1007/978-3-031-16446-0_36
pmc: PMC9942274
mid: NIHMS1870804
doi:

Types de publication

Journal Article

Langues

eng

Pagination

376-386

Subventions

Organisme : NIDCD NIH HHS
ID : R01 DC014717
Pays : United States
Organisme : NIDCD NIH HHS
ID : R01 DC018511
Pays : United States

Références

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5243-5260
pubmed: 33945470
Med Image Anal. 2021 Apr;69:101942
pubmed: 33418465
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:1481-1485
pubmed: 36212702
IEEE J Biomed Health Inform. 2022 Jul;26(7):3185-3196
pubmed: 35139030
Proc IEEE Int Symp Biomed Imaging. 2021 Apr;2021:
pubmed: 34707796
Med Image Comput Comput Assist Interv. 2013;16(Pt 3):41-8
pubmed: 24505742
Med Image Comput Comput Assist Interv. 2021;12903:138-148
pubmed: 34734217
Proc IEEE Int Symp Biomed Imaging. 2013 Dec 31;2013:1465-1468
pubmed: 24443699
J Acoust Soc Am. 2005 Aug;118(2):887-906
pubmed: 16158645
Proc SPIE Int Soc Opt Eng. 2022 Feb-Mar;12032:
pubmed: 36203947
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2007-2010
pubmed: 34891681
Brainlesion. 2021;12658:80-91
pubmed: 34013242

Auteurs

Xiaofeng Liu (X)

Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.

Fangxu Xing (F)

Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.

Jerry L Prince (JL)

Johns Hopkins University, Baltimore, MD, USA.

Jiachen Zhuo (J)

University of Maryland, Baltimore, MD, USA.

Maureen Stone (M)

University of Maryland, Baltimore, MD, USA.

Georges El Fakhri (GE)

Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.

Jonghye Woo (J)

Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.

Classifications MeSH