Speech synthesis from ECoG using densely connected 3D convolutional neural networks.
Journal
Journal of neural engineering
ISSN: 1741-2552
Titre abrégé: J Neural Eng
Pays: England
ID NLM: 101217933
Informations de publication
Date de publication:
06 2019
06 2019
Historique:
pubmed:
5
3
2019
medline:
4
6
2020
entrez:
5
3
2019
Statut:
ppublish
Résumé
Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
Identifiants
pubmed: 30831567
doi: 10.1088/1741-2552/ab0c59
pmc: PMC6822609
mid: NIHMS1029540
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
036019Subventions
Organisme : NIDCD NIH HHS
ID : F32 DC015708
Pays : United States
Organisme : NINDS NIH HHS
ID : R01 NS094748
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR000150
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001422
Pays : United States
Références
Lancet. 2017 May 6;389(10081):1821-1830
pubmed: 28363483
J Speech Hear Res. 1995 Oct;38(5):1001-13
pubmed: 8558870
Proc Natl Acad Sci U S A. 2017 Jun 6;114(23):E4530-E4538
pubmed: 28533406
J Physiol Paris. 2016 Nov;110(4 Pt A):392-401
pubmed: 28756027
Brain Lang. 2019 Jun;193:73-83
pubmed: 27377299
Neurotherapeutics. 2019 Jan;16(1):144-165
pubmed: 30617653
IEEE Trans Biomed Eng. 2004 Jun;51(6):1034-43
pubmed: 15188875
Front Neuroeng. 2014 May 27;7:14
pubmed: 24904404
PLoS One. 2013;8(1):e53398
pubmed: 23408924
Front Neurosci. 2015 Jun 12;9:217
pubmed: 26124702
J Neurosci. 2008 Nov 5;28(45):11526-36
pubmed: 18987189
Sci Rep. 2017 Dec 5;7(1):16947
pubmed: 29209023
Nat Rev Neurosci. 2012 Jan 05;13(2):135-45
pubmed: 22218206
Sci Rep. 2016 May 11;6:25803
pubmed: 27165452
Annu Int Conf IEEE Eng Med Biol Soc. 2015 Aug;2015:2844-7
pubmed: 26736884
J Commun Disord. 2007 Mar-Apr;40(2):116-28
pubmed: 16860820
Nature. 2006 Jul 13;442(7099):164-71
pubmed: 16838014
Cell. 2018 Jun 28;174(1):21-31.e9
pubmed: 29958109
PLoS One. 2016 Nov 22;11(11):e0166872
pubmed: 27875590
Otolaryngol Clin North Am. 2008 Aug;41(4):793-818, vii
pubmed: 18570960
Neuroimage. 2018 Oct 15;180(Pt A):253-266
pubmed: 28723578
PLoS One. 2009 Dec 09;4(12):e8218
pubmed: 20011034
J Neural Eng. 2014 Jun;11(3):035015
pubmed: 24836588
Tech Doc Rep U S Air Force Syst Command Electron Syst Div. 1963 Jun;86:1-44
pubmed: 14131127
J Acoust Soc Am. 2002 May;111(5 Pt 1):2237-41
pubmed: 12051443
Front Neurosci. 2016 Sep 27;10:429
pubmed: 27729844
Science. 2014 Feb 28;343(6174):1006-10
pubmed: 24482117
PLoS Biol. 2012 Jan;10(1):e1001251
pubmed: 22303281
Science. 2017 Aug 25;357(6353):797-801
pubmed: 28839071
Nat Methods. 2018 Oct;15(10):805-815
pubmed: 30224673
Clin Neurophysiol. 2002 Jun;113(6):767-91
pubmed: 12048038
PLoS Comput Biol. 2019 Sep 16;15(9):e1007091
pubmed: 31525179
Front Comput Neurosci. 2017 Feb 09;11:7
pubmed: 28232797
Percept Mot Skills. 2015 Jun;120(3):747-65
pubmed: 26029968
Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:1540-1543
pubmed: 28268620
Hum Brain Mapp. 2017 Nov;38(11):5391-5420
pubmed: 28782865
J Neurosci Methods. 2016 Dec 1;274:141-145
pubmed: 27746229
J Neural Eng. 2010 Oct;7(5):056007
pubmed: 20811093
Proc Natl Acad Sci U S A. 2017 May 02;114(18):4799-4804
pubmed: 28420788
Elife. 2017 Feb 21;6:
pubmed: 28220753
J Neural Eng. 2016 Oct;13(5):056004
pubmed: 27484713
J Neurosci. 2018 Nov 14;38(46):9803-9813
pubmed: 30257858
Neuroimage. 2018 Oct 15;180(Pt A):301-311
pubmed: 28993231
J Speech Hear Disord. 1990 Nov;55(4):721-8
pubmed: 2232752
J Neural Eng. 2012 Apr;9(2):026027
pubmed: 22427488
Front Hum Neurosci. 2015 Feb 24;9:97
pubmed: 25759647
J Neurophysiol. 2017 Aug 1;118(2):1329-1343
pubmed: 28615329