Classifying coherent versus nonsense speech perception from EEG using linguistic speech features.
CNN
Deep learning
EEG decoding
Linguistics
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
14 08 2024
14 08 2024
Historique:
received:
15
04
2024
accepted:
06
08
2024
medline:
15
8
2024
pubmed:
15
8
2024
entrez:
14
8
2024
Statut:
epublish
Résumé
When a person listens to natural speech, the relation between features of the speech signal and the corresponding evoked electroencephalogram (EEG) is indicative of neural processing of the speech signal. Using linguistic representations of speech, we investigate the differences in neural processing between speech in a native and foreign language that is not understood. We conducted experiments using three stimuli: a comprehensible language, an incomprehensible language, and randomly shuffled words from a comprehensible language, while recording the EEG signal of native Dutch-speaking participants. We modeled the neural tracking of linguistic features of the speech signals using a deep-learning model in a match-mismatch task that relates EEG signals to speech, while accounting for lexical segmentation features reflecting acoustic processing. The deep learning model effectively classifies coherent versus nonsense languages. We also observed significant differences in tracking patterns between comprehensible and incomprehensible speech stimuli within the same language. It demonstrates the potential of deep learning frameworks in measuring speech understanding objectively.
Identifiants
pubmed: 39143297
doi: 10.1038/s41598-024-69568-0
pii: 10.1038/s41598-024-69568-0
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
18922Subventions
Organisme : Fonds Wetenschappelijk Onderzoek - Vlaanderen
ID : 1S49823N
Organisme : Fonds Wetenschappelijk Onderzoek - Vlaanderen
ID : 1290821N
Organisme : Fonds Wetenschappelijk Onderzoek - Vlaanderen
ID : 1SA0620N
Organisme : Fonds Wetenschappelijk Onderzoek - Vlaanderen
ID : 1S40122N
Organisme : Fonds Wetenschappelijk Onderzoek - Vlaanderen
ID : 1S89622N
Organisme : KU Leuven
ID : PDMT1/23/011
Informations de copyright
© 2024. The Author(s).
Références
Accou, B., Monesi, M. J., Hamme, H. V. & Francart, T. Predicting speech intelligibility from EEG in a non-linear classification paradigm. J. Neural Eng. 18, 066008. https://doi.org/10.1088/1741-2552/ac33e9 (2021).
doi: 10.1088/1741-2552/ac33e9
Accou, B., Van Vanthornhout, J., Hamme, H. & Francart, T. Decoding of the speech envelope from EEG using the VLAAI deep neural network. Sci. Rep. 13(1), 812. https://doi.org/10.1038/s41598-022-27332-2 (2023).
doi: 10.1038/s41598-022-27332-2
pubmed: 36646740
pmcid: 9842721
Anderson, S., Parbery-Clark, A., White-Schwoch, T. & Kraus, N. Auditory brainstem response to complex sounds predicts self-reported speech-in-noise performance. J. Speech Lang. Hear. Res. 56(1), 31–43. https://doi.org/10.1044/1092-4388(2012/12-0043) (2013).
doi: 10.1044/1092-4388(2012/12-0043)
pubmed: 22761320
Bollens, L., Francart, T., Hamme Van, H. Learning subject-invariant representations from speech-evoked eeg using variational autoencoders. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1256–1260, (2022). https://doi.org/10.1109/ICASSP43922.2022.9747297 .
Bollens, L., Accou, B., Van Hamme, H., Francart, T. A Large Auditory EEG decoding dataset, (2023). https://doi.org/10.48804/K3VSND
Brodbeck, C. & Simon, J. Z. Continuous speech processing. Curr. Opin. Physio. 18, 25–31. https://doi.org/10.1016/j.cophys.2020.07.014 (2020).
doi: 10.1016/j.cophys.2020.07.014
Brodbeck, C., Hong, L. E. & Simon, J. Z. Rapid transformation from auditory to linguistic representations of continuous speech. Curr. Biol. 28(24), 3976-3983.e5. https://doi.org/10.1016/j.cub.2018.10.042 (2018).
doi: 10.1016/j.cub.2018.10.042
pubmed: 30503620
pmcid: 6339854
Broderick, M. P., Anderson, A. J., Di Liberto, G. M., Crosse, M. J. & Lalor, E. C. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech. Curr. Biol. 28(5), 803-809.e3. https://doi.org/10.1016/j.cub.2018.01.080 (2018).
doi: 10.1016/j.cub.2018.01.080
pubmed: 29478856
Caucheteux, C. & King, J.-R. Brains and algorithms partially converge in natural language processing. Commun. Biol. 5(1), 134. https://doi.org/10.1038/s42003-022-03036-1 (2022).
doi: 10.1038/s42003-022-03036-1
pubmed: 35173264
pmcid: 8850612
De Clercq, P., Puffay, C., Kries, J., Van Hamme, H., Vandermosten, M., Francart, T, Vanthornhout, J. Detecting post-stroke aphasia via brain responses to speech in a deep learning framework, arXiv:2401.10291 (2024).
Crosse, M. J., Di Liberto, G. M., Bednar, A. & Lalor, E. C. The multivariate temporal response function (mTRF) toolbox: A MATLAB toolbox for relating neural signals to continuous stimuli. Front. Hum. Neurosci. 10(NOV2016), 1–14. https://doi.org/10.3389/fnhum.2016.00604 (2016).
doi: 10.3389/fnhum.2016.00604
Daube, C., Ince, R. A. A. & Gross, J. Simple acoustic features can explain phoneme-based predictions of cortical responses to speech. Curr. Biol. 29(12), 1924–19379. https://doi.org/10.1016/j.cub.2019.04.067 (2019).
doi: 10.1016/j.cub.2019.04.067
pubmed: 31130454
pmcid: 6584359
de Cheveigné, A., Slaney, M., Fuglsang, S. A. & Hjortkjaer, J. Auditory stimulus-response modeling with a match-mismatch task. J. Neural Eng. 18(4), 046040. https://doi.org/10.1088/1741-2552/abf771 (2021).
doi: 10.1088/1741-2552/abf771
de Taillez, T., Kollmeier, B. & Meyer, B. T. Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech. Eur. J. Neurosci. 51(5), 1234–1241. https://doi.org/10.1111/ejn.13790 (2020).
doi: 10.1111/ejn.13790
pubmed: 29205588
Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O. & King, J. R. Decoding speech perception from non-invasive brain recordings. Nat. Mach. Intell. 5(10), 1097–1107. https://doi.org/10.1038/s42256-023-00714-5 (2023).
doi: 10.1038/s42256-023-00714-5
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June (2019). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 .
Di Liberto, G. M., O’Sullivan, J. A. & Lalor, E. C. Low-frequency cortical entrainment to speech reflects phoneme-level processing. Curr. Biol. 25(19), 2457–2465. https://doi.org/10.1016/J.CUB.2015.08.030 (2015).
doi: 10.1016/J.CUB.2015.08.030
pubmed: 26412129
Ding, N, Simon, J.Z. Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences109(29), 11854–11859 https://doi.org/10.1073/PNAS.1205381109 (2012).
Duchateau, J., Kong, Y., Cleuren, L., Latacz, L., Roelens, J., S., Abdurrahman, D., Kris, G., Pol, V., Werner & Van hamme, H. Developing a reading tutor : design and evaluation of dedicated speech recognition and synthesis modules, (2009). ISSN 1872-7182.
Gillis, M., Van Canneyt, J., Francart, T. & Vanthornhout, J. Neural tracking as a diagnostic tool to assess the auditory pathway. bioRxiv, (2022). https://doi.org/10.1101/2021.11.26.470129 .
Gillis, M., Vanthornhout, J. & Francart, T. Heard or understood? neural tracking of language features in a comprehensible story, an incomprehensible story and a word list. eNeuro https://doi.org/10.1523/ENEURO.0075-23.2023 (2023).
doi: 10.1523/ENEURO.0075-23.2023
pubmed: 37643864
pmcid: 10488220
Goldstein, A. et al. Shared computational principles for language processing in humans and deep language models. Nat. Neurosci. 25(3), 369–380. https://doi.org/10.1038/s41593-022-01026-4 (2022).
doi: 10.1038/s41593-022-01026-4
pubmed: 35260860
pmcid: 8904253
Gwilliams, L. & Davis, M. H. Extracting language content from speech sounds: The information theoretic approach 113–139 (Springer, Cham, 2022).
Hullett, P. W., Hamilton, L. S., Mesgarani, N., Schreiner, C. E. & Chang, E. F. Human superior temporal gyrus organization of spectrotemporal modulation tuning derived from speech stimuli. J. Neurosci. 36(6), 2014–2026 (2016).
doi: 10.1523/JNEUROSCI.1779-15.2016
pubmed: 26865624
pmcid: 4748082
Jawahar, G., Sagot, B., Seddah, D. What does BERT learn about the structure of language? In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, July (2019). https://inria.hal.science/hal-02131630 .
Keshishian, M. et al. Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex. Nat. Hum. Behav. 7(5), 740–753. https://doi.org/10.1038/s41562-023-01520-0 (2023).
doi: 10.1038/s41562-023-01520-0
pubmed: 36864134
pmcid: 10417567
Keuleers, E., Brysbaert, M. & New, B. S. A new measure for Dutch word frequency based on film subtitles. Behav. Res. Methods 42(3), 643–650 (2010).
doi: 10.3758/BRM.42.3.643
pubmed: 20805586
Koskinen, M., Kurimo, M., Gross, J., Hyvärinen, A. & Hari, R. Brain activity reflects the predictability of word sequences in listened continuous speech. Neuroimage 219, 116936. https://doi.org/10.1016/j.neuroimage.2020.116936 (2020).
doi: 10.1016/j.neuroimage.2020.116936
pubmed: 32474080
Gwilliams, D. P. L., Marantz, A. & King, J.-R. Top-down information shapes lexical processing when listening to continuous speech. Lang. Cognit. Neurosci. https://doi.org/10.1080/23273798.2023.2171072 (2023).
doi: 10.1080/23273798.2023.2171072
Lesenfants, D., Vanthornhout, J., Verschueren, E & Francart, T. Data-driven spatial filtering for improved measurement of cortical tracking of multiple representations of speech. bioRxiv, (2019). https://doi.org/10.1101/551218 .
McGee, T. J. & Clemis, J. D. The approximation of audiometric thresholds by auditory brain stem responses. Otolaryngol. Head Neck Surg. 88(3), 295–303. https://doi.org/10.1177/019459988008800319 (1980).
doi: 10.1177/019459988008800319
pubmed: 7402671
Monesi, M.J., Accou, B., Montoya-Martinez, J., Francart, T., Van Hamme, H. An LSTM Based Architecture to Relate Speech Stimulus to Eeg. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2020-May (637424): 941–945, (2020). ISSN 15206149. https://doi.org/10.1109/ICASSP40776.2020.9054000 .
Picton, T. W., Dimitrijevic, A., Perez-Abalo, M.-C. & Van Roon, P. Estimating audiometric thresholds using auditory steady-state responses. J. Am. Acad. Audiol. 16(03), 140–156. https://doi.org/10.3766/jaaa.16.3.3 (2005).
doi: 10.3766/jaaa.16.3.3
pubmed: 15844740
Puffay, C., Van Canneyt, J., Vanthornhout, J., Van hamme, H. & Francart, T. 2022 Relating the fundamental frequency of speech with EEG using a dilated convolutional network. In 23rd Annual Conf. of the Int. Speech Communication Association (ISCA)—Interspeech 4038–4042 (2022).
Puffay, C. et al. Relating EEG to continuous speech using deep neural networks: A review. J. Neural Eng. 20(4) 041003. https://doi.org/10.1088/1741-2552/ace73f (2023).
doi: 10.1088/1741-2552/ace73f
Puffay, C. et al. Robust neural tracking of linguistic speech representations using a convolutional neural network. J. Neural Eng. 20(4), 046040. https://doi.org/10.1088/1741-2552/acf1ce (2023).
doi: 10.1088/1741-2552/acf1ce
Somers, B., Francart, T. & Bertrand, A. A generic EEG artifact removal algorithm based on the multi-channel Wiener filter. J. Neural Eng. 15(3), 036007. https://doi.org/10.1088/1741-2552/aaac92 (2018).
doi: 10.1088/1741-2552/aaac92
pubmed: 29393057
Thornton, M., Mandic, D., Reichenbach, T. Relating eeg recordings to speech using envelope tracking and the speech-ffr. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–2, (2023). https://doi.org/10.1109/ICASSP49357.2023.10096082 .
Van Canneyt, J., Wouters, J. & Francart, T. Neural tracking of the fundamental frequency of the voice: The effect of voice characteristics. Eur. J. Neurosci. 53(11), 3640–3653. https://doi.org/10.1111/ejn.15229 (2021).
doi: 10.1111/ejn.15229
pubmed: 33861480
Vanthornhout, J., Decruy, L., Wouters, J., Simon, J. Z. & Francart, T. Speech intelligibility predicted from neural entrainment of the speech envelope. JARO - J. Assoc. Res. Otolaryngol. 19(2), 181–191. https://doi.org/10.1007/s10162-018-0654-z (2018).
doi: 10.1007/s10162-018-0654-z
pubmed: 29464412
Verschueren, E., Gillis, M., Decruy, L., Vanthornhout, J. & Francart, T. Speech understanding oppositely affects acoustic and linguistic neural tracking in a speech rate manipulation paradigm. J. Neurosci. 42(39), 7442–7453. https://doi.org/10.1523/JNEUROSCI.0259-22.2022 (2022).
doi: 10.1523/JNEUROSCI.0259-22.2022
pubmed: 36041851
pmcid: 9525161
Weissbart, H., Kandylaki, K. & Reichenbach, T. Cortical tracking of surprisal during continuous speech comprehension. J. Cognit. Neurosci. 32, 1–12 (2019).
Yılmaz, E. et al. Open Source Speech and Language Resources for Frisian. Proc. Interspeech 2016, pages 1536–1540, (2016). https://doi.org/10.21437/Interspeech.2016-48 .