A practical guide to calculating vocal tract length and scale-invariant formant patterns.

Body size Formants Speaker normalization Vocal tract length normalization Vowel

Journal

Behavior research methods

ISSN: 1554-3528

Titre abrégé: Behav Res Methods

Pays: United States

ID NLM: 101244316

Informations de publication

Date de publication:
29 Dec 2023

Historique:

accepted: 02 11 2023

medline: 2 1 2024

pubmed: 2 1 2024

entrez: 29 12 2023

Statut: aheadofprint

Résumé

Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.

Identifiants

DOI: 10.3758/s13428-023-02288-x PMID: 38158551

pubmed: 38158551

doi: 10.3758/s13428-023-02288-x

pii: 10.3758/s13428-023-02288-x

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Anikin, A. (2019). Soundgen: An open-source tool for synthesizing nonverbal vocalizations. Behavior Research Methods, 51(2), 778–792.

pubmed: 30054898 doi: 10.3758/s13428-018-1095-7

Anikin, A., Barreda, S., & Reby, D. (2023). A practical guide to estimating vocal tract length and vowel quality from formants: Supplementary materials. https://doi.org/10.17605/OSF.IO/4C2R9

Anikin, A., Valente, D., Pisanski, K., Cornec, C., Bryant, G., & Reby, D. (2023). The role of loudness in vocal intimidation. Journal of Experimental Psychology: General. https://osf.io/preprints/psyarxiv/qgyev . Accessed 15 Nov 2023.

Atal, B. S., Chang, J. J., Mathews, M. V., & Tukey, J. W. (1978). Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. The Journal of the Acoustical Society of America, 63(5), 1535–1555.

pubmed: 690333 doi: 10.1121/1.381848

Barreda, S. (2015). phonTools: Functions for phonetics in R. https://cran.r-project.org/package=phonTools

Barreda, S. (2016). Investigating the use of formant frequencies in listener judgments of speaker size. Journal of Phonetics, 55, 1–18.

doi: 10.1016/j.wocn.2015.11.004

Barreda, S. (2017a). An investigation of the systematic use of spectral information in the determination of apparent-talker height. The Journal of the Acoustical Society of America, 141(6), 4781–4792.

pubmed: 28679275 doi: 10.1121/1.4985192

Barreda, S. (2017b). Listeners respond to phoneme-specific spectral information when assessing speaker size from speech. Journal of Phonetics, 63, 1–18.

doi: 10.1016/j.wocn.2017.03.002

Barreda, S. (2020). Vowel normalization as perceptual constancy. Language, 96(2), 224–254.

doi: 10.1353/lan.2020.0018

Barreda, S. (2021a). Fast Track: Fast (nearly) automatic formant-tracking using Praat. Linguistics Vanguard, 7(1), 20200051.

doi: 10.1515/lingvan-2020-0051

Barreda, S. (2021b). Perceptual validation of vowel normalization methods for variationist research. Language Variation and Change, 33(1), 27–53.

doi: 10.1017/S0954394521000016

Barreda, S., & Nearey, T. M. (2018). A regression approach to vowel normalization for missing and unbalanced data. The Journal of the Acoustical Society of America, 144(1), 500–520.

pubmed: 30075677 doi: 10.1121/1.5047742

Beeck, V. C., Heilmann, G., Kerscher, M., & Stoeger, A. S. (2022). Sound visualization demonstrates velopharyngeal coupling and complex spectral variability in Asian elephants. Animals, 12(16), 2119.

pubmed: 36009709 pmcid: 9404934 doi: 10.3390/ani12162119

Behrman, A. (2021). Speech and voice science (Fourth ed.). San Diego.

Belyk, M., Waters, S., Kanber, E., Miquel, M. E., & McGettigan, C. (2022). Individual differences in vocal size exaggeration. Scientific Reports, 12(1), 1–12.

doi: 10.1038/s41598-022-05170-6

Boë, L.-J., Berthommier, F., Legou, T., Captier, G., Kemp, C., Sawallis, T. R., Becker, Y., Rey, A., & Fagot, J. (2017). Evidence of a vocalic proto-system in the baboon (Papio papio) suggests pre-hominin speech precursors. PloS One, 12(1), e0169321.

pubmed: 28076426 pmcid: 5226677 doi: 10.1371/journal.pone.0169321

Boersma, P. (2006). Praat: Doing phonetics by computer. http://www.praat.org/ . Accessed 15 Nov 2023.

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80, 1–28.

doi: 10.18637/jss.v080.i01

Cartei, V., Garnham, A., Oakhill, J., Banerjee, R., Roberts, L., & Reby, D. (2019). Children can control the expression of masculinity and femininity through the voice. Royal Society Open Science, 6(7), 190656.

pubmed: 31417760 pmcid: 6689575 doi: 10.1098/rsos.190656

Charlton, B. D., & Reby, D. (2016). The evolution of acoustic size exaggeration in terrestrial mammals. Nature Communications, 7, 12739.

pubmed: 27598835 pmcid: 5025854 doi: 10.1038/ncomms12739

Fant, G. (1975). Non-uniform vowel normalization. STL-QPSR, 16(2–3), 1–19.

Fastl, H., & Zwicker, E. (2006). Psychoacoustics: Facts and models. Third edition. Springer: Berlin.

Fitch, W. T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America, 102(2), 1213–1222.

pubmed: 9265764 doi: 10.1121/1.421048

Fitch, W. T., de Boer, B., Mathur, N., & Ghazanfar, A. A. (2016). Monkey vocal tracts are speech-ready. Science Advances, 2(12), e1600723.

pubmed: 27957536 pmcid: 5148209 doi: 10.1126/sciadv.1600723

Fitch, W. T., & Giedd, J. (1999). Morphology and development of the human vocal tract: A study using magnetic resonance imaging. The Journal of the Acoustical Society of America, 106(3), 1511–1522.

pubmed: 10489707 doi: 10.1121/1.427148

Fulop, S. (2011). Speech spectrum analysis. Springer.

doi: 10.1007/978-3-642-17478-0

Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. The Journal of the Acoustical Society of America, 97(5), 3099–3111.

pubmed: 7759650 doi: 10.1121/1.411872

Johnson, K. (2011). Acoustic and auditory phonetics. Wiley-Blackwell.

Johnson, K. (2020). The ΔF method of vocal tract length normalization for vowels. Laboratory Phonology, 11(1).

Johnson, K., & Sjerps, M. J. (2021). Speaker normalization in speech perception. The Handbook of Speech Perception, 145–176.

Kendall, T., & Thomas, E. R. (2018). Vowels: Vowel Manipulation, Normalization, and Plotting in R. https://cran.r-project.org/package=vowels . Accessed 15 Nov 2023.

Kim, J., Toutios, A., Lee, S., & Narayanan, S. S. (2020). Vocal tract shaping of emotional speech. Computer Speech & Language, 101100.

Lammert, A. C., & Narayanan, S. S. (2015). On short-time estimation of vocal tract length from formant frequencies. PloS One, 10(7), e0132193.

pubmed: 26177102 pmcid: 4503663 doi: 10.1371/journal.pone.0132193

Maeda, S., & Laprie, Y. (2013). Vowel and prosodic factor dependent variations of vocal-tract length. In InterSpeech-14th Annual Conference of the International Speech Communication Association-2013. Aug 2013.

Miller, J. D. (1989). Auditory-perceptual interpretation of the vowel. The Journal of the Acoustical Society of America, 85(5), 2114–2134.

pubmed: 2659639 doi: 10.1121/1.397862

Nearey, T. M. (1978). Phonetic feature systems for vowels. Indiana University Linguistics Club.

Nearey, T. M., & Assmann, P. F. (2007). Probabilistic ‘sliding-template’ models for indirect vowel normalization. In M.-J. Solé, P. S. Beddor, & M. Ohala (Eds.), Experimental approaches to phonology (pp. 246–269). Oxford University Press.

doi: 10.1093/oso/9780199296675.003.0016

Pfefferle, D., & Fischer, J. (2006). Sounds and size: Identification of acoustic variables that reflect body size in hamadryas baboons. Papio hamadryas. Animal Behaviour, 72(1), 43–51.

doi: 10.1016/j.anbehav.2005.08.021

Pisanski, K., Anikin, A., & Reby, D. (2022). Vocal size exaggeration may have contributed to the origins of vocalic complexity. Philosophical Transactions of the Royal Society B, 377(1841), 20200401.

doi: 10.1098/rstb.2020.0401

Pisanski, K., & Bryant, G. A. (2019). The evolution of voice perception. Oxford Handbook of Voice Studies, 269–300.

Pisanski, K., Fraccaro, P. J., Tigue, C. C., O’Connor, J. J., Röder, S., Andrews, P. W., Fink, B., DeBruine, L. M., Jones, B. C., & Feinberg, D. R. (2014). Vocal indicators of body size in men and women: A meta-analysis. Animal Behaviour, 95, 89–99.

doi: 10.1016/j.anbehav.2014.06.011

Pisanski, K., Jones, B. C., Fink, B., O’Connor, J. J., DeBruine, L. M., Röder, S., & Feinberg, D. R. (2016a). Voice parameters predict sex-specific body morphology in men and women. Animal Behaviour, 112, 13–22.

doi: 10.1016/j.anbehav.2015.11.008

Pisanski, K., Mora, E. C., Pisanski, A., Reby, D., Sorokowski, P., Frackowiak, T., & Feinberg, D. R. (2016b). Volitional exaggeration of body size through fundamental and formant frequency modulation in humans. Scientific Reports, 6, 34389.

pubmed: 27687571 pmcid: 5043380 doi: 10.1038/srep34389

Reby, D., & McComb, K. (2003). Anatomical constraints generate honesty: Acoustic cues to age and weight in the roars of red deer stags. Animal Behaviour, 65(3), 519–530.

doi: 10.1006/anbe.2003.2078

Reby, D., McComb, K., Cargnelutti, B., Darwin, C., Fitch, W. T., & Clutton-Brock, T. (2005). Red deer stags use formants as assessment cues during intrasexual agonistic interactions. Proceedings of the Royal Society of London B: Biological Sciences, 272(1566), 941–947.

Reby, D., Wyman, M., Frey, R., Passilongo, D., Gilbert, J., Locatelli, Y., & Charlton, B. (2016). Evidence of biphonation and source–filter interactions in the bugles of male North American wapiti (Cervus canadensis). Journal of Experimental Biology, 219(8), 1224–1236.

pubmed: 27103677 doi: 10.1242/jeb.131219

RStudio Team. (2022). RStudio: Integrated Development Environment for R. RStudio, PBC. http://www.rstudio.com/ . Accessed 15 Nov 2023.

Syrdal, A. K., & Gopal, H. S. (1986). A perceptual model of vowel recognition based on the auditory representation of American English vowels. The Journal of the Acoustical Society of America, 79(4), 1086–1100.

pubmed: 3700864 doi: 10.1121/1.393381

Titze, I. R. (2000). Principles of voice production. Second printing. Iowa City.

Turner, R. E., Walters, T. C., Monaghan, J. J., & Patterson, R. D. (2009). A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data. The Journal of the Acoustical Society of America, 125(4), 2374–2386.

pubmed: 19354411 doi: 10.1121/1.3079772

Vinh, N. X., Epps, J., & Bailey, J. (2009). Information theoretic measures for clusterings comparison: Is a correction for chance necessary? Proceedings of the 26th Annual International Conference on Machine Learning, 1073–1080.

Wakita, H. (1977). Normalization of vowels by vocal-tract length and its application to vowel identification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(2), 183–192.

doi: 10.1109/TASSP.1977.1162929

Whalen, D., Chen, W.-R., Shadle, C. H., & Fulop, S. A. (2022). Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986). The Journal of the Acoustical Society of America, 152(2), 933–941.

pubmed: 36050157 pmcid: 9374483 doi: 10.1121/10.0013410

A practical guide to calculating vocal tract length and scale-invariant formant patterns.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Andrey Anikin (A)

Santiago Barreda (S)

David Reby (D)

Classifications MeSH