Automatic modelling of perceptual judges in the context of head and neck cancer speech intelligibility.

automatic speech processing head and neck cancer pathological speech speaker embeddings speech intelligibility

Journal

International journal of language & communication disorders
ISSN: 1460-6984
Titre abrégé: Int J Lang Commun Disord
Pays: United States
ID NLM: 9803709

Informations de publication

Date de publication:
18 Jan 2024
Historique:
received: 31 07 2023
accepted: 21 12 2023
medline: 19 1 2024
pubmed: 19 1 2024
entrez: 18 1 2024
Statut: aheadofprint

Résumé

Perceptual measures such as speech intelligibility are known to be biased, variant and subjective, to which an automatic approach has been seen as a more reliable alternative. On the other hand, automatic approaches tend to lack explainability, an aspect that can prevent the widespread usage of these technologies clinically. In the present work, we aim to study the relationship between four perceptual parameters and speech intelligibility by automatically modelling the behaviour of six perceptual judges, in the context of head and neck cancer. From this evaluation we want to assess the different levels of relevance of each parameter as well as the different judge profiles that arise, both perceptually and automatically. Based on a passage reading task from the Carcinologic Speech Severity Index (C2SI) corpus, six expert listeners assessed the voice quality, resonance, prosody and phonemic distortions, as well as the speech intelligibility of patients treated for oral or oropharyngeal cancer. A statistical analysis and an ensemble of automatic systems, one per judge, were devised, where speech intelligibility is predicted as a function of the four aforementioned perceptual parameters of voice quality, resonance, prosody and phonemic distortions. The results suggest that we can automatically predict speech intelligibility as a function of the four aforementioned perceptual parameters, achieving a high correlation of 0.775 (Spearman's ρ). Furthermore, different judge profiles were found perceptually that were successfully modelled automatically. The four investigated perceptual parameters influence the global rating of speech intelligibility, showing that different judge profiles emerge. The proposed automatic approach displayed a more uniform profile across all judges, displaying a more reliable, unbiased and objective prediction. The system also adds an extra layer of interpretability, since speech intelligibility is regressed as a direct function of the individual prediction of the four perceptual parameters, an improvement over more black box approaches. What is already known on this subject Speech intelligibility is a clinical measure typically used in the post-treatment assessment of speech affecting disorders, such as head and neck cancer. Their perceptual assessment is currently the main method of evaluation; however, it is known to be quite subjective since intelligibility can be seen as a combination of other perceptual parameters (voice quality, resonance, etc.). Given this, automatic approaches have been seen as a more viable alternative to the traditionally used perceptual assessments. What this study adds to existing knowledge The present work introduces a study based on the relationship between four perceptual parameters (voice quality, resonance, prosody and phonemic distortions) and speech intelligibility, by automatically modelling the behaviour of six perceptual judges. The results suggest that different judge profiles arise, both in the perceptual case as well as in the automatic models. These different profiles found showcase the different schools of thought that perceptual judges have, in comparison to the automatic judges, that display more uniform levels of relevance across all the four perceptual parameters. This aspect shows that an automatic approach promotes unbiased, reliable and more objective predictions. What are the clinical implications of this work? The automatic prediction of speech intelligibility, using a combination of four perceptual parameters, show that these approaches can achieve high correlations with the reference scores while maintaining a certain degree of explainability. The more uniform judge profiles found on the automatic case also display less biased results towards the four perceptual parameters. This aspect facilitates the clinical implementation of this class of systems, as opposed to the more subjective and harder to reproduce perceptual assessments.

Sections du résumé

BACKGROUND BACKGROUND
Perceptual measures such as speech intelligibility are known to be biased, variant and subjective, to which an automatic approach has been seen as a more reliable alternative. On the other hand, automatic approaches tend to lack explainability, an aspect that can prevent the widespread usage of these technologies clinically.
AIMS OBJECTIVE
In the present work, we aim to study the relationship between four perceptual parameters and speech intelligibility by automatically modelling the behaviour of six perceptual judges, in the context of head and neck cancer. From this evaluation we want to assess the different levels of relevance of each parameter as well as the different judge profiles that arise, both perceptually and automatically.
METHODS AND PROCEDURES METHODS
Based on a passage reading task from the Carcinologic Speech Severity Index (C2SI) corpus, six expert listeners assessed the voice quality, resonance, prosody and phonemic distortions, as well as the speech intelligibility of patients treated for oral or oropharyngeal cancer. A statistical analysis and an ensemble of automatic systems, one per judge, were devised, where speech intelligibility is predicted as a function of the four aforementioned perceptual parameters of voice quality, resonance, prosody and phonemic distortions.
OUTCOMES AND RESULTS RESULTS
The results suggest that we can automatically predict speech intelligibility as a function of the four aforementioned perceptual parameters, achieving a high correlation of 0.775 (Spearman's ρ). Furthermore, different judge profiles were found perceptually that were successfully modelled automatically.
CONCLUSIONS AND IMPLICATIONS CONCLUSIONS
The four investigated perceptual parameters influence the global rating of speech intelligibility, showing that different judge profiles emerge. The proposed automatic approach displayed a more uniform profile across all judges, displaying a more reliable, unbiased and objective prediction. The system also adds an extra layer of interpretability, since speech intelligibility is regressed as a direct function of the individual prediction of the four perceptual parameters, an improvement over more black box approaches.
WHAT THIS PAPER ADDS CONCLUSIONS
What is already known on this subject Speech intelligibility is a clinical measure typically used in the post-treatment assessment of speech affecting disorders, such as head and neck cancer. Their perceptual assessment is currently the main method of evaluation; however, it is known to be quite subjective since intelligibility can be seen as a combination of other perceptual parameters (voice quality, resonance, etc.). Given this, automatic approaches have been seen as a more viable alternative to the traditionally used perceptual assessments. What this study adds to existing knowledge The present work introduces a study based on the relationship between four perceptual parameters (voice quality, resonance, prosody and phonemic distortions) and speech intelligibility, by automatically modelling the behaviour of six perceptual judges. The results suggest that different judge profiles arise, both in the perceptual case as well as in the automatic models. These different profiles found showcase the different schools of thought that perceptual judges have, in comparison to the automatic judges, that display more uniform levels of relevance across all the four perceptual parameters. This aspect shows that an automatic approach promotes unbiased, reliable and more objective predictions. What are the clinical implications of this work? The automatic prediction of speech intelligibility, using a combination of four perceptual parameters, show that these approaches can achieve high correlations with the reference scores while maintaining a certain degree of explainability. The more uniform judge profiles found on the automatic case also display less biased results towards the four perceptual parameters. This aspect facilitates the clinical implementation of this class of systems, as opposed to the more subjective and harder to reproduce perceptual assessments.

Identifiants

pubmed: 38237606
doi: 10.1111/1460-6984.13004
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Hospitals of Toulouse, and by the French National Research Agency
ID : ANR-18-CE45-0008
Organisme : This project has received funding from the European Union's Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreement No 766287

Informations de copyright

© 2024 The Authors. International Journal of Language & Communication Disorders published by John Wiley & Sons Ltd on behalf of Royal College of Speech and Language Therapists.

Références

Balaguer, M., Boisguérin, A., Galtier, A., Gaillard, N., Puech, M. & Woisard, V. (2019) Assessment of impairment of intelligibility and of speech signal after oral cavity and oropharynx cancer. European Annals of Oto-rhino-laryngology, Head and Neck Diseases, 136(5), 355-359.
Balaguer, M., Pommée, T., Farinas, J., Pinquier, J. & Woisard, V. (2021) Paramètres perceptifs expliquant la sévérité du trouble de parole mesurée automatiquement en cancérologie ORL. Rééducation orthophonique, Ortho édition, Chapitre : “De l'exploration à la prise en soins de la voix chez l'adulte : données actuelles.. sur la voie des voix”, 286, 1-13.
Balaguer, M., Pommée, T., Farinas, J., Pinquier, J., Woisard, V. & Speyer, R. (2019) Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: systematic review. Head & Neck, 41(1), 111-130.
Barbero Jiménez, Á., Lázaro, J.L. & Dorronsoro, J.R. (2007) Finding optimal model parameters by discrete grid search. AINSC Advances in Soft Computing, 77(13), 2824-2832.
Bodt, M., Huici, M. & Heyning, P. (2002) Intelligibility as a linear combination of dimensions in dysarthric speech. Journal of Communication Disorders, 35(3), 283-292.
Bunton, K., Kent, R.D., Duffy, J.R., Rosenbek, J.C. & Kent, J.F. (2007) Listener agreement for auditory-perceptual ratings of dysarthria. Journal of Speech, Language and Hearing Research, 50(6), 1481-1495.
Chou, H.C. & Lee, C.C. (2019) Every rating matters: joint learning of subjective labels and individual annotators for speech emotion classification. Proceedings of ICASSP, pp. 5886-5890.
Christensen, H., Cunningham, S., Fox, C., Green, P. & Hain, T. (2012) A comparative study of adaptive, automatic recognition of disordered speech. Proceedings of Interspeech, pp. 1776-1779.
de Graeff, A., de Leeuw, R.J., Ros, W.J., Hordijk, G.-J., Blijham, G.H. & Winnubst, J.A. (2000) Long-term quality of life of patients with head and neck cancer. The Laryngoscope, 110(1), 98-106.
Dwivedi, R.C., Kazi, R.A., Agrawal, N., Nutting, C., Clarke, P.M., Kerawala, C.J., Rhys-Evans, P. & Harrington, K.J. (2009) Evaluation of speech outcomes following treatment of oral and oropharyngeal cancers. Cancer Treatment Reviews, 35(5), 417-424.
Fayek, H.M., Lech, M. & Cavedon, L. (2016) Modeling subjectiveness in emotion recognition with deep neural networks: ensembles vs soft labels. Proceedings of the International Joint Conference on Neural Networks, pp. 556-570.
Fex, S. (1992) Perceptual evaluation. IEEE Transactions on Acoustics, Speech and Signal Processing, 6(2), 155-158.
Fredouille, C., Ghio, A., Laaridh, I., Lalain, M. & Woisard, V. (2019) Acoustic-phonetic decoding for speech intelligibility evaluation in the context of head and neck cancers. International Congress of Phonetic Sciences (ICPhS), pp. 3051-3055.
Ghio, A., Pouchoulin, G., Teston, B., Pinto, S., Fredouille, C., De Looze, C., Robert, D., Viallet, F. & Giovanni, A. (2012) How to manage sound, physiological and clinical data of 2500 dysphonic and dysarthric speakers? Speech Communication, 54(5), 664-679.
Good, P. (2005) Multivariate analysis. Permutation, parametric and bootstrap tests of hypotheses. 169-188.
Han, J., Zhang, Z., Schmitt, M., Pantic, M. & Schuller, B. (2017) From hard to soft: towards more human-like emotion recognition by modelling the perception uncertainty. Proceedings of the 25thACM International Conference on Multimedia, pp. 890-897, https://doi.org/10.1145/3123266.3123383
Haralabopoulos, G., M Tsikandilakis, M.T. & McAuley, D. (2020) Objective assessment of subjective tasks in crowdsourcing applications. Proceedings of the LREC 2020 Workshop on Citizen Linguistics in Language Resource Development, pp. 15-25.
Hustad, K.C. (2008) The relationship between listener comprehension and intelligibility scores for speakers with dysarthria. Journal of Speech, Language and Hearing Research, 51(3), 562-573.
Keintz, C.K., Bunton, K. & Hoit, J.D. (2007) Influence of visual information on the intelligibility of dysarthric speech. American Journal of Speech-Language Pathology, 16(3), 222-234.
Kent, R.D. & Kim, Y. (2003) Toward an acoustic typology of motor speech disorders. Clinical Linguistics & Phonetics, 17(6), 427-445.
Klopfenstein, M. (2009) Interaction between prosody and intelligibility. International Journal of Speech-Language Pathology, 20(4), 326-331.
Laaridh, I., Fredouille, C., Ghio, A., Lalain, M. & Woisard, V. (2018) Automatic evaluation of speech intelligibility based on i-vectors in the context of head and neck cancers. Proceedings of Interspeech, pp. 2943-2947.
Middag, C., Martens, J.P., Nuffelen, G.V. & Bodt, M.D. (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP Journal on Advances in Signal Processing, ArticleID 629030. https://doi.org/10.1155/2009/629030
Miller, N. (2013) Measuring up to speech intelligibility. Language & Communication Disorders, 48(6), 601-612.
Pappagari, R., Wang, T., Villalba, J., Chen, N. & Dehak, N. (2020) X-vectors meet emotions: a study on dependencies between emotion and speaker recognition. Proceedings of ICASSP, pp. 7169-7173.
Pommée, T., Balaguer, M., Mauclair, J., Pinquier, J. & Woisard, V. (2021a) Intelligibility and comprehensibility: a Delphi consensus study. International Journal of Language & Communication Disorders, 57(1), 21-41.
Pommée, T., Balaguer, M., Mauclair, J., Pinquier, J. & Woisard, V. (2021b) Assessment of adult speech disorders: current situation and needs in French-speaking clinical practice. Logopedics Phoniatrics Vocology, 47(2), 92-108.
Quintas, S., Abad, A., Mauclair, J., Woisard, V. & Pinquier, J. (2023) Towards reducing patient effort for the automatic prediction of speech intelligibility in head and neck cancers. Proceedings of ICASSP, pp. 1-5.
Quintas, S., Mauclair, J., Woisard, V. & Pinquier, J. (2020) Automatic prediction of speech intelligibility based on x-vectorsin the context of head and neck cancer. Proceedings of Interspeech, pp. 4076-4980.
Quintas, S., Mauclair, J., Woisard, V. & Pinquier, J. (2022) Automatic assessment of speech intelligibility using consonant similarity for head and neck cancer. Proceedings of Interspeech, pp. 3608-3612.
Rizos, G. & Schuller, B.W. (2020) Average jane, where art thou?-recent avenues in efficient machine learning under subjectivity uncertainty. IPMU International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 42-55.
Rodrigues, F. & Pereira, F. (2019) Deep learning from crowds. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1611-1618.
Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D. & Khudanpur, S. (2018) Spoken language recognition using x-vectors. Proceedings of Interspeech, pp. 105-111.
Snyder, D., Garcia-Romero, D., Povey, D. & Khudanpur, S. (2017) Deep neural network embeddings for text-independent speaker verification. Proceedings of Interspeech, pp. 999-1003.
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. & Khudanpur, S. (2018) X-vectors: robust DNN embeddings for speaker recognition. Proceedings of ICASSP, pp. 5329-5333.
Stipancic, K.L., Tjaden, K. & Wilding, G. (2016) Comparison of intelligibility measures for adults with Parkinson's disease, adults with multiple sclerosis, and healthy controls. Journal of Speech, Language and Hearing Research, 59(2), 230-238.
Sussman, J.E. & Tjaden, K. (2012) Perceptual measures of speech from individuals with Parkinson's disease and multiple sclerosis: intelligibility and beyond. Journal of Speech Language and Hearing Research, 55(4), 1208-1219.
Tjaden, K., Kain, A. & Lam, J. (2014) Hybridizing conversational and clear speech to investigate the source of increased intelligibility in speakers with parkinson's disease. Journal of Speech, Language and Hearing Research, 57(4), 1191-1205.
Woisard, V., Astésano, C., Balaguer, M., Farinas, J., Fredouille, C., Gaillard, P., Ghio, A., Giusti, L., Laaridh, I., Lalain, M., Lepage, B., Mauclair, J., Nocaudie, O., Pinquier, J., Pouchoulin, G., Puech, M., Robert, D. & Roger, V. (2021) C2SI corpus: a database of speech disorder productions to assess intelligibility and quality of life in head and neck cancers. Language Resources and Evaluation, 55, 173-190, https://doi.org/10.1007/s10579-020-09496-3
Woisard, V., Balaguer, M., Fredouille, C., Farinas, J., Ghio, A., Lalain, M., Puech, M., Astesano, C., Pinquier, J. & Lepage, B. (2021) Construction of an automatic score for the evaluation of speech disorders among patients treated for a cancer of the oral cavity or the oropharynx: the carcinologic speech severity index. Head & Neck, 44(1), 71-88.
Zhang, Y. & Yang, Q. (2018) An overview of multi-task learning. National Science Review, 5, 30-43. https://doi.org/10.1093/nsr/nwx105

Auteurs

Sebastião Quintas (S)

IRIT, Université de Toulouse, CNRS, Toulouse INP, UT3, Toulouse, France.

Mathieu Balaguer (M)

IRIT, Université de Toulouse, CNRS, Toulouse INP, UT3, Toulouse, France.
Laboratoire de NeuroPsychoLinguistique, UR 4156, Université de Toulouse, Toulouse, France.

Julie Mauclair (J)

IRIT, Université de Toulouse, CNRS, Toulouse INP, UT3, Toulouse, France.

Virginie Woisard (V)

Laboratoire de NeuroPsychoLinguistique, UR 4156, Université de Toulouse, Toulouse, France.
IUC Toulouse, CHU Toulouse, Service ORL de l'Hôpital Larrey, Toulouse, France.

Julien Pinquier (J)

IRIT, Université de Toulouse, CNRS, Toulouse INP, UT3, Toulouse, France.

Classifications MeSH