Performance of Forced-Alignment Algorithms on Children's Speech.
Journal
Journal of speech, language, and hearing research : JSLHR
ISSN: 1558-9102
Titre abrégé: J Speech Lang Hear Res
Pays: United States
ID NLM: 9705610
Informations de publication
Date de publication:
18 06 2021
18 06 2021
Historique:
pubmed:
12
3
2021
medline:
6
7
2021
entrez:
11
3
2021
Statut:
ppublish
Résumé
Purpose Acoustic measurement of speech sounds requires first segmenting the speech signal into relevant units (words, phones, etc.). Manual segmentation is cumbersome and time consuming. Forced-alignment algorithms automate this process by aligning a transcript and a speech sample. We compared the phoneme-level alignment performance of five available forced-alignment algorithms on a corpus of child speech. Our goal was to document aligner performance for child speech researchers. Method The child speech sample included 42 children between 3 and 6 years of age. The corpus was force-aligned using the Montreal Forced Aligner with and without speaker adaptive training, triphone alignment from the Kaldi speech recognition engine, the Prosodylab-Aligner, and the Penn Phonetics Lab Forced Aligner. The sample was also manually aligned to create gold-standard alignments. We evaluated alignment algorithms in terms of accuracy (whether the interval covers the midpoint of the manual alignment) and difference in phone-onset times between the automatic and manual intervals. Results The Montreal Forced Aligner with speaker adaptive training showed the highest accuracy and smallest timing differences. Vowels were consistently the most accurately aligned class of sounds across all the aligners, and alignment accuracy increased with age for fricative sounds across the aligners too. Conclusion The best-performing aligner fell just short of human-level reliability for forced alignment. Researchers can use forced alignment with child speech for certain classes of sounds (vowels, fricatives for older children), especially as part of a semi-automated workflow where alignments are later inspected for gross errors. Supplemental Material https://doi.org/10.23641/asha.14167058.
Identifiants
pubmed: 33705675
doi: 10.1044/2020_JSLHR-20-00268
pmc: PMC8740721
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
2213-2222Subventions
Organisme : NIDCD NIH HHS
ID : R01 DC006859
Pays : United States
Organisme : NIDCD NIH HHS
ID : R01 DC015653
Pays : United States
Organisme : NICHD NIH HHS
ID : U54 HD090256
Pays : United States
Références
Comput Speech Lang. 2017 Sep;45:278-299
pubmed: 28943715
Am J Speech Lang Pathol. 2018 Nov 21;27(4):1546-1571
pubmed: 30177993
J Speech Lang Hear Res. 2018 Oct 26;61(10):2487-2501
pubmed: 30458531
Int J Speech Lang Pathol. 2018 Nov;20(6):599-609
pubmed: 31274357