Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.


Journal

Journal of speech, language, and hearing research : JSLHR
ISSN: 1558-9102
Titre abrégé: J Speech Lang Hear Res
Pays: United States
ID NLM: 9705610

Informations de publication

Date de publication:
04 Jul 2024
Historique:
medline: 4 7 2024
pubmed: 4 7 2024
entrez: 4 7 2024
Statut: aheadofprint

Résumé

This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy. Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (a) one speaker-independent ASR system trained and optimized for typical speech and (b) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word error rates were calculated for each speech model, read versus conversational, and subject. Linear mixed-effects models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap. We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy. We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively.

Identifiants

pubmed: 38963790
doi: 10.1044/2024_JSLHR-24-00045
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1-10

Auteurs

Jimmy Tobin (J)

Google LLC, Mountain View, CA.

Phillip Nelson (P)

Google LLC, Mountain View, CA.

Bob MacDonald (B)

Google LLC, Mountain View, CA.

Rus Heywood (R)

Google LLC, Mountain View, CA.

Richard Cave (R)

MND Association, Northampton, United Kingdom.

Katie Seaver (K)

MGH Institute of Health Professions, Boston, MA.

Antoine Desjardins (A)

MGH Institute of Health Professions, Boston, MA.

Pan-Pan Jiang (PP)

Google LLC, Mountain View, CA.

Jordan R Green (JR)

MGH Institute of Health Professions, Boston, MA.
Harvard University, Cambridge, MA.

Classifications MeSH