Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator.
Journal
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention
Titre abrégé: Med Image Comput Comput Assist Interv
Pays: Germany
ID NLM: 101249582
Informations de publication
Date de publication:
Sep 2022
Sep 2022
Historique:
entrez:
23
2
2023
pubmed:
24
2
2023
medline:
24
2
2023
Statut:
ppublish
Résumé
Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities-i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform-is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size. Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual attention strategy to specifically exploit the moving muscular structures during speech. In addition, we leverage a pairwise correlation of the samples with the same utterances with a latent space representation disentanglement strategy. Furthermore, we incorporate an adversarial training approach with generative adversarial networks to offer improved realism on our generated spectrograms. Our experimental results, carried out with a total of 63 tagged-MRI sequences alongside speech acoustics, showed that our framework enabled the generation of clear audio waveforms from a sequence of tagged-MRI, surpassing competing methods. Thus, our framework provides the great potential to help better understand the relationship between the two modalities.
Identifiants
pubmed: 36820764
doi: 10.1007/978-3-031-16446-0_36
pmc: PMC9942274
mid: NIHMS1870804
doi:
Types de publication
Journal Article
Langues
eng
Pagination
376-386Subventions
Organisme : NIDCD NIH HHS
ID : R01 DC014717
Pays : United States
Organisme : NIDCD NIH HHS
ID : R01 DC018511
Pays : United States
Références
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5243-5260
pubmed: 33945470
Med Image Anal. 2021 Apr;69:101942
pubmed: 33418465
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:1481-1485
pubmed: 36212702
IEEE J Biomed Health Inform. 2022 Jul;26(7):3185-3196
pubmed: 35139030
Proc IEEE Int Symp Biomed Imaging. 2021 Apr;2021:
pubmed: 34707796
Med Image Comput Comput Assist Interv. 2013;16(Pt 3):41-8
pubmed: 24505742
Med Image Comput Comput Assist Interv. 2021;12903:138-148
pubmed: 34734217
Proc IEEE Int Symp Biomed Imaging. 2013 Dec 31;2013:1465-1468
pubmed: 24443699
J Acoust Soc Am. 2005 Aug;118(2):887-906
pubmed: 16158645
Proc SPIE Int Soc Opt Eng. 2022 Feb-Mar;12032:
pubmed: 36203947
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2007-2010
pubmed: 34891681
Brainlesion. 2021;12658:80-91
pubmed: 34013242