Performance of Forced-Alignment Algorithms on Children's Speech.

Adolescent Algorithms Child Humans Phonetics Reproducibility of Results Speech Speech Production Measurement

Journal

Journal of speech, language, and hearing research : JSLHR

ISSN: 1558-9102

Titre abrégé: J Speech Lang Hear Res

Pays: United States

ID NLM: 9705610

Informations de publication

Date de publication:
18 06 2021

Historique:

pubmed: 12 3 2021

medline: 6 7 2021

entrez: 11 3 2021

Statut: ppublish

Résumé

Purpose Acoustic measurement of speech sounds requires first segmenting the speech signal into relevant units (words, phones, etc.). Manual segmentation is cumbersome and time consuming. Forced-alignment algorithms automate this process by aligning a transcript and a speech sample. We compared the phoneme-level alignment performance of five available forced-alignment algorithms on a corpus of child speech. Our goal was to document aligner performance for child speech researchers. Method The child speech sample included 42 children between 3 and 6 years of age. The corpus was force-aligned using the Montreal Forced Aligner with and without speaker adaptive training, triphone alignment from the Kaldi speech recognition engine, the Prosodylab-Aligner, and the Penn Phonetics Lab Forced Aligner. The sample was also manually aligned to create gold-standard alignments. We evaluated alignment algorithms in terms of accuracy (whether the interval covers the midpoint of the manual alignment) and difference in phone-onset times between the automatic and manual intervals. Results The Montreal Forced Aligner with speaker adaptive training showed the highest accuracy and smallest timing differences. Vowels were consistently the most accurately aligned class of sounds across all the aligners, and alignment accuracy increased with age for fricative sounds across the aligners too. Conclusion The best-performing aligner fell just short of human-level reliability for forced alignment. Researchers can use forced alignment with child speech for certain classes of sounds (vowels, fricatives for older children), especially as part of a semi-automated workflow where alignments are later inspected for gross errors. Supplemental Material https://doi.org/10.23641/asha.14167058.

Identifiants

DOI: 10.1044/2020_JSLHR-20-00268 PMID: 33705675 PMC: PMC8740721

pubmed: 33705675

doi: 10.1044/2020_JSLHR-20-00268

pmc: PMC8740721

doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

Pagination

2213-2222

Subventions

Organisme : NIDCD NIH HHS

ID : R01 DC006859

Pays : United States

Organisme : NIDCD NIH HHS

ID : R01 DC015653

Pays : United States

Organisme : NICHD NIH HHS

ID : U54 HD090256

Pays : United States

Références

Comput Speech Lang. 2017 Sep;45:278-299

pubmed: 28943715

Am J Speech Lang Pathol. 2018 Nov 21;27(4):1546-1571

pubmed: 30177993

J Speech Lang Hear Res. 2018 Oct 26;61(10):2487-2501

pubmed: 30458531

Int J Speech Lang Pathol. 2018 Nov;20(6):599-609

pubmed: 31274357

Performance of Forced-Alignment Algorithms on Children's Speech.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Références

Auteurs

Tristan J Mahr (TJ)

Visar Berisha (V)

Kan Kawabata (K)

Julie Liss (J)

Katherine C Hustad (KC)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH