A practical guide to calculating vocal tract length and scale-invariant formant patterns.
Body size
Formants
Speaker normalization
Vocal tract length normalization
Vowel
Journal
Behavior research methods
ISSN: 1554-3528
Titre abrégé: Behav Res Methods
Pays: United States
ID NLM: 101244316
Informations de publication
Date de publication:
29 Dec 2023
29 Dec 2023
Historique:
accepted:
02
11
2023
medline:
2
1
2024
pubmed:
2
1
2024
entrez:
29
12
2023
Statut:
aheadofprint
Résumé
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.
Identifiants
pubmed: 38158551
doi: 10.3758/s13428-023-02288-x
pii: 10.3758/s13428-023-02288-x
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2023. The Author(s).
Références
Anikin, A. (2019). Soundgen: An open-source tool for synthesizing nonverbal vocalizations. Behavior Research Methods, 51(2), 778–792.
pubmed: 30054898
doi: 10.3758/s13428-018-1095-7
Anikin, A., Barreda, S., & Reby, D. (2023). A practical guide to estimating vocal tract length and vowel quality from formants: Supplementary materials. https://doi.org/10.17605/OSF.IO/4C2R9
Anikin, A., Valente, D., Pisanski, K., Cornec, C., Bryant, G., & Reby, D. (2023). The role of loudness in vocal intimidation. Journal of Experimental Psychology: General. https://osf.io/preprints/psyarxiv/qgyev . Accessed 15 Nov 2023.
Atal, B. S., Chang, J. J., Mathews, M. V., & Tukey, J. W. (1978). Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. The Journal of the Acoustical Society of America, 63(5), 1535–1555.
pubmed: 690333
doi: 10.1121/1.381848
Barreda, S. (2015). phonTools: Functions for phonetics in R. https://cran.r-project.org/package=phonTools
Barreda, S. (2016). Investigating the use of formant frequencies in listener judgments of speaker size. Journal of Phonetics, 55, 1–18.
doi: 10.1016/j.wocn.2015.11.004
Barreda, S. (2017a). An investigation of the systematic use of spectral information in the determination of apparent-talker height. The Journal of the Acoustical Society of America, 141(6), 4781–4792.
pubmed: 28679275
doi: 10.1121/1.4985192
Barreda, S. (2017b). Listeners respond to phoneme-specific spectral information when assessing speaker size from speech. Journal of Phonetics, 63, 1–18.
doi: 10.1016/j.wocn.2017.03.002
Barreda, S. (2020). Vowel normalization as perceptual constancy. Language, 96(2), 224–254.
doi: 10.1353/lan.2020.0018
Barreda, S. (2021a). Fast Track: Fast (nearly) automatic formant-tracking using Praat. Linguistics Vanguard, 7(1), 20200051.
doi: 10.1515/lingvan-2020-0051
Barreda, S. (2021b). Perceptual validation of vowel normalization methods for variationist research. Language Variation and Change, 33(1), 27–53.
doi: 10.1017/S0954394521000016
Barreda, S., & Nearey, T. M. (2018). A regression approach to vowel normalization for missing and unbalanced data. The Journal of the Acoustical Society of America, 144(1), 500–520.
pubmed: 30075677
doi: 10.1121/1.5047742
Beeck, V. C., Heilmann, G., Kerscher, M., & Stoeger, A. S. (2022). Sound visualization demonstrates velopharyngeal coupling and complex spectral variability in Asian elephants. Animals, 12(16), 2119.
pubmed: 36009709
pmcid: 9404934
doi: 10.3390/ani12162119
Behrman, A. (2021). Speech and voice science (Fourth ed.). San Diego.
Belyk, M., Waters, S., Kanber, E., Miquel, M. E., & McGettigan, C. (2022). Individual differences in vocal size exaggeration. Scientific Reports, 12(1), 1–12.
doi: 10.1038/s41598-022-05170-6
Boë, L.-J., Berthommier, F., Legou, T., Captier, G., Kemp, C., Sawallis, T. R., Becker, Y., Rey, A., & Fagot, J. (2017). Evidence of a vocalic proto-system in the baboon (Papio papio) suggests pre-hominin speech precursors. PloS One, 12(1), e0169321.
pubmed: 28076426
pmcid: 5226677
doi: 10.1371/journal.pone.0169321
Boersma, P. (2006). Praat: Doing phonetics by computer. http://www.praat.org/ . Accessed 15 Nov 2023.
Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80, 1–28.
doi: 10.18637/jss.v080.i01
Cartei, V., Garnham, A., Oakhill, J., Banerjee, R., Roberts, L., & Reby, D. (2019). Children can control the expression of masculinity and femininity through the voice. Royal Society Open Science, 6(7), 190656.
pubmed: 31417760
pmcid: 6689575
doi: 10.1098/rsos.190656
Charlton, B. D., & Reby, D. (2016). The evolution of acoustic size exaggeration in terrestrial mammals. Nature Communications, 7, 12739.
pubmed: 27598835
pmcid: 5025854
doi: 10.1038/ncomms12739
Fant, G. (1975). Non-uniform vowel normalization. STL-QPSR, 16(2–3), 1–19.
Fastl, H., & Zwicker, E. (2006). Psychoacoustics: Facts and models. Third edition. Springer: Berlin.
Fitch, W. T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America, 102(2), 1213–1222.
pubmed: 9265764
doi: 10.1121/1.421048
Fitch, W. T., de Boer, B., Mathur, N., & Ghazanfar, A. A. (2016). Monkey vocal tracts are speech-ready. Science Advances, 2(12), e1600723.
pubmed: 27957536
pmcid: 5148209
doi: 10.1126/sciadv.1600723
Fitch, W. T., & Giedd, J. (1999). Morphology and development of the human vocal tract: A study using magnetic resonance imaging. The Journal of the Acoustical Society of America, 106(3), 1511–1522.
pubmed: 10489707
doi: 10.1121/1.427148
Fulop, S. (2011). Speech spectrum analysis. Springer.
doi: 10.1007/978-3-642-17478-0
Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. The Journal of the Acoustical Society of America, 97(5), 3099–3111.
pubmed: 7759650
doi: 10.1121/1.411872
Johnson, K. (2011). Acoustic and auditory phonetics. Wiley-Blackwell.
Johnson, K. (2020). The ΔF method of vocal tract length normalization for vowels. Laboratory Phonology, 11(1).
Johnson, K., & Sjerps, M. J. (2021). Speaker normalization in speech perception. The Handbook of Speech Perception, 145–176.
Kendall, T., & Thomas, E. R. (2018). Vowels: Vowel Manipulation, Normalization, and Plotting in R. https://cran.r-project.org/package=vowels . Accessed 15 Nov 2023.
Kim, J., Toutios, A., Lee, S., & Narayanan, S. S. (2020). Vocal tract shaping of emotional speech. Computer Speech & Language, 101100.
Lammert, A. C., & Narayanan, S. S. (2015). On short-time estimation of vocal tract length from formant frequencies. PloS One, 10(7), e0132193.
pubmed: 26177102
pmcid: 4503663
doi: 10.1371/journal.pone.0132193
Maeda, S., & Laprie, Y. (2013). Vowel and prosodic factor dependent variations of vocal-tract length. In InterSpeech-14th Annual Conference of the International Speech Communication Association-2013. Aug 2013.
Miller, J. D. (1989). Auditory-perceptual interpretation of the vowel. The Journal of the Acoustical Society of America, 85(5), 2114–2134.
pubmed: 2659639
doi: 10.1121/1.397862
Nearey, T. M. (1978). Phonetic feature systems for vowels. Indiana University Linguistics Club.
Nearey, T. M., & Assmann, P. F. (2007). Probabilistic ‘sliding-template’ models for indirect vowel normalization. In M.-J. Solé, P. S. Beddor, & M. Ohala (Eds.), Experimental approaches to phonology (pp. 246–269). Oxford University Press.
doi: 10.1093/oso/9780199296675.003.0016
Pfefferle, D., & Fischer, J. (2006). Sounds and size: Identification of acoustic variables that reflect body size in hamadryas baboons. Papio hamadryas. Animal Behaviour, 72(1), 43–51.
doi: 10.1016/j.anbehav.2005.08.021
Pisanski, K., Anikin, A., & Reby, D. (2022). Vocal size exaggeration may have contributed to the origins of vocalic complexity. Philosophical Transactions of the Royal Society B, 377(1841), 20200401.
doi: 10.1098/rstb.2020.0401
Pisanski, K., & Bryant, G. A. (2019). The evolution of voice perception. Oxford Handbook of Voice Studies, 269–300.
Pisanski, K., Fraccaro, P. J., Tigue, C. C., O’Connor, J. J., Röder, S., Andrews, P. W., Fink, B., DeBruine, L. M., Jones, B. C., & Feinberg, D. R. (2014). Vocal indicators of body size in men and women: A meta-analysis. Animal Behaviour, 95, 89–99.
doi: 10.1016/j.anbehav.2014.06.011
Pisanski, K., Jones, B. C., Fink, B., O’Connor, J. J., DeBruine, L. M., Röder, S., & Feinberg, D. R. (2016a). Voice parameters predict sex-specific body morphology in men and women. Animal Behaviour, 112, 13–22.
doi: 10.1016/j.anbehav.2015.11.008
Pisanski, K., Mora, E. C., Pisanski, A., Reby, D., Sorokowski, P., Frackowiak, T., & Feinberg, D. R. (2016b). Volitional exaggeration of body size through fundamental and formant frequency modulation in humans. Scientific Reports, 6, 34389.
pubmed: 27687571
pmcid: 5043380
doi: 10.1038/srep34389
Reby, D., & McComb, K. (2003). Anatomical constraints generate honesty: Acoustic cues to age and weight in the roars of red deer stags. Animal Behaviour, 65(3), 519–530.
doi: 10.1006/anbe.2003.2078
Reby, D., McComb, K., Cargnelutti, B., Darwin, C., Fitch, W. T., & Clutton-Brock, T. (2005). Red deer stags use formants as assessment cues during intrasexual agonistic interactions. Proceedings of the Royal Society of London B: Biological Sciences, 272(1566), 941–947.
Reby, D., Wyman, M., Frey, R., Passilongo, D., Gilbert, J., Locatelli, Y., & Charlton, B. (2016). Evidence of biphonation and source–filter interactions in the bugles of male North American wapiti (Cervus canadensis). Journal of Experimental Biology, 219(8), 1224–1236.
pubmed: 27103677
doi: 10.1242/jeb.131219
RStudio Team. (2022). RStudio: Integrated Development Environment for R. RStudio, PBC. http://www.rstudio.com/ . Accessed 15 Nov 2023.
Syrdal, A. K., & Gopal, H. S. (1986). A perceptual model of vowel recognition based on the auditory representation of American English vowels. The Journal of the Acoustical Society of America, 79(4), 1086–1100.
pubmed: 3700864
doi: 10.1121/1.393381
Titze, I. R. (2000). Principles of voice production. Second printing. Iowa City.
Turner, R. E., Walters, T. C., Monaghan, J. J., & Patterson, R. D. (2009). A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data. The Journal of the Acoustical Society of America, 125(4), 2374–2386.
pubmed: 19354411
doi: 10.1121/1.3079772
Vinh, N. X., Epps, J., & Bailey, J. (2009). Information theoretic measures for clusterings comparison: Is a correction for chance necessary? Proceedings of the 26th Annual International Conference on Machine Learning, 1073–1080.
Wakita, H. (1977). Normalization of vowels by vocal-tract length and its application to vowel identification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(2), 183–192.
doi: 10.1109/TASSP.1977.1162929
Whalen, D., Chen, W.-R., Shadle, C. H., & Fulop, S. A. (2022). Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986). The Journal of the Acoustical Society of America, 152(2), 933–941.
pubmed: 36050157
pmcid: 9374483
doi: 10.1121/10.0013410