When context is and isn't helpful: A corpus study of naturalistic speech.


Journal

Psychonomic bulletin & review
ISSN: 1531-5320
Titre abrégé: Psychon Bull Rev
Pays: United States
ID NLM: 9502924

Informations de publication

Date de publication:
Aug 2020
Historique:
pubmed: 14 3 2020
medline: 7 1 2021
entrez: 14 3 2020
Statut: ppublish

Résumé

Infants learn about the sounds of their language and adults process the sounds they hear, even though sound categories often overlap in their acoustics. Researchers have suggested that listeners rely on context for these tasks, and have proposed two main ways that context could be helpful: top-down information accounts, which argue that listeners use context to predict which sound will be produced, and normalization accounts, which argue that listeners compensate for the fact that the same sound is produced differently in different contexts by factoring out this systematic context-dependent variability from the acoustics. These ideas have been somewhat conflated in past research, and have rarely been tested on naturalistic speech. We implement top-down and normalization accounts separately and evaluate their relative efficacy on spontaneous speech, using the test case of Japanese vowels. We find that top-down information strategies are effective even on spontaneous speech. Surprisingly, we find that at least one common implementation of normalization is ineffective on spontaneous speech, in contrast to what has been found on lab speech. We provide analyses showing that when there are systematic regularities in which contexts different sounds occur in-which are common in naturalistic speech, but generally controlled for in lab speech-normalization can actually increase category overlap rather than decrease it. This work calls into question the usefulness of normalization in naturalistic listening tasks, and highlights the importance of applying ideas from carefully controlled lab speech to naturalistic, spontaneous speech.

Identifiants

pubmed: 32166605
doi: 10.3758/s13423-019-01687-6
pii: 10.3758/s13423-019-01687-6
doi:

Types de publication

Journal Article Review

Langues

eng

Sous-ensembles de citation

IM

Pagination

640-676

Références

Adelson, E.H. (1993). Perceptual organization and the judgment of brightness. Science, 262(5142), 2042–2044.
pubmed: 8266102
Ainsworth, W. (1973). Durational cues in the perception of certain consonants. Proceedings of the British Acoustical Society, 2, 1–4.
Ainsworth, W. (1974). The influence of precursive sequences on the perception of synthesized vowels. Language and Speech, 17(2), 103–109.
pubmed: 4465607
Allen, J.S., Miller, J.L., & DeSteno, D. (2003). Individual talker differences in voice-onset-time. The Journal of the Acoustical Society of America, 113(1), 544–552.
pubmed: 12558290
Antetomaso, S., Miyazawa, K., Feldman, N., Elsner, M., Hitczenko, K., & Mazuka, R. (2017). Modeling phonetic category learning from natural acoustic data. In BUCLD 41: Proceedings of the 41st Annual Boston University Conference on Language Development.
Apfelbaum, K.S., & McMurray, B. (2015). Relative cue encoding in the context of sophisticated models of categorization: Separating information from categorization. Psychonomic Bulletin & Review, 22(4), 916–943.
Arai, T., Behne, D., Czigler, P., & Sullivan, K. (1999). Perceptual cues to vowel quantity: Evidence from Swedish and Japanese. In Proceedings of the Swedish Phonetics Conference (FONETIK), (Vol. 81 pp. 8–11).
Bar, M. (2004). Visual objects in context. Nature Reviews Neuroscience, 5(8), 617.
Bar, M., & Ullman, S. (1996). Spatial context in recognition. Perception, 25(3), 343–352.
pubmed: 8804097
Bion, R.A., Miyazawa, K., Kikuchi, H., & Mazuka, R. (2013). Learning phonemic vowel length from naturalistic recordings of Japanese infant-directed speech. PLOS ONE, 8(2), e51594.
pubmed: 23437036 pmcid: 3577837
Boersma, P. (2001). Praat: A system for doing phonetics by computer. Glot International, 5(9/10), 341–345.
Boucher, V.J. (2002). Timing relations in speech and the identification of voice-onset times: A stable perceptual boundary for voicing categories across speaking rates. Perception & Psychophysics, 64(1), 121–130.
Brown, R.W., & Hildum, D.C. (1956). Expectancy and the perception of syllables. Language, 32(3), 411–419.
Chen, H., Yamane, N., Rattasone, N.X., Demuth, K., & Mazuka, R. (2016). Japanese infants are aware of phonemic vowel length in novel words at 18 months. In BUCLD 40: Proceedings of the 40th Annual Boston University Conference on Language Development.
Cole, J., Linebaugh, G., Munson, C., & McMurray, B. (2010). Unmasking the acoustic effects of vowel-to-vowel coarticulation: A statistical modeling approach. Journal of Phonetics, 38(2), 167–184.
pubmed: 21173864 pmcid: 3003261
Crystal, T.H., & House, A.S. (1990). Articulation rate and the duration of syllables and stress groups in connected speech. The Journal of the Acoustical Society of America, 88(1), 101– 112.
pubmed: 2380438
Dilley, L.C., & Pitt, M.A. (2010). Altering context speech rate can cause words to appear or disappear. Psychological Science, 21(11), 1664–1670.
pubmed: 20876883
Dillon, B., Dunbar, E., & Idsardi, W. (2013). A single-stage approach to learning phonological categories: Insights from Inuktitut. Cognitive Science, 37(2), 344–377.
pubmed: 23137418
Elsner, M., Goldwater, S., Feldman, N., & Wood, F. (2013a). A joint learning model of word segmentation, lexical acquisition, and phonetic variability. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 42–54).
Feldman, N.H., Griffiths, T.L., Goldwater, S., & Morgan, J.L. (2013a). A role for the developing lexicon in phonetic category acquisition. Psychological Review, 120(4), 751.
pubmed: 24219848
Feldman, N.H., Myers, E.B., White, K.S., Griffiths, T.L., & Morgan, J.L. (2013b). Word-level information influences phonetic learning in adults and infants. Cognition, 127(3), 427–438.
pubmed: 23562941 pmcid: 3646897
Fujisaki, H., & Kunisaki, O. (1978). Analysis, recognition, and perception of voiceless fricative consonants in Japanese. IEEE Transactions on Acoustics on Speech, and Signal Processing, 26(1), 21–27.
Fujisaki, H., Nakamura, K., & Imoto, T. (1975). Auditory perception of duration of speech and non-speech stimuli. Auditory Analysis and Perception of Speech, 197–219.
Fukui, S. (1978). Perception for the Japanese stop consonants with reduced and extended durations. Onsei Gakkai Kaihou, 59, 9–12.
Ganong, W.F. (1980). Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance, 6(1), 110.
pubmed: 6444985
Guevara-Rukoz, A., Cristia, A., Ludusan, B., Thiollière, R., Martin, A., & Mazuka, R. (2018). Are words easier to learn from infant-than adult-directed speech? A quantitative corpus-based investigation. Cognitive Science, 42(5), 1586–1617.
Han, M.S. (1994). Acoustic manifestations of mora timing in Japanese. The Journal of the Acoustical Society of America, 96(1), 73–82.
He, A.X., & Lidz, J. (2017). Verb learning in 14-and 18-month-old English-learning infants. Language Learning and Development, 13(3), 335–356.
Hillenbrand, J., Getty, L.A., Clark, M.J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. The Journal of the Acoustical Society of America, 97(5), 3099–3111.
pubmed: 7759650
Hillenbrand, J., Clark, M.J., & Nearey, T.M. (2001). Effects of consonant environment on vowel formant patterns. The Journal of the Acoustical Society of America, 109(2), 748–763.
pubmed: 11248979
Hirata, Y. (2004). Effects of speaking rate on the vowel length distinction in Japanese. Journal of Phonetics, 32(4), 565–589.
Hisagi, M., Shafer, V.L., Strange, W., & Sussman, E.S. (2010). Perception of a Japanese vowel length contrast by Japanese and American English listeners: Behavioral and electrophysiological measures. Brain Research, 1360, 89–105.
pubmed: 20816759 pmcid: 2994183
Höhle, B., Weissenborn, J., Kiefer, D., Schulz, A., & Schmitz, M. (2004). Functional elements in infants’ speech processing: The role of determiners in the syntactic categorization of lexical elements. Infancy, 5(3), 341–353.
House, A.S. (1961). On vowel duration in English. The Journal of the Acoustical Society of America, 33(9), 1174–1178.
Imai, M., & Kita, S. (2014). The sound symbolism bootstrapping hypothesis for language acquisition and language evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1651), 20130298.
Isei-Jaakkola, T. (2004). Lexical quantity in Japanese and Finnish. Unpublished doctoral dissertation.
Johnson, K. (1997). Speech perception without speaker normalization: An exemplar model. Talker Variability in Speech Processing, 145–165.
Johnson, K. (2006). Resonance in an exemplar-based lexicon: The emergence of social identity and phonology. Journal of Phonetics, 34(4), 485–499.
Jongman, A., Wayland, R., & Wong, S. (2000). Acoustic characteristics of English fricatives. The Journal of the Acoustical Society of America, 108(3), 1252–1263.
pubmed: 11008825
Kawahara, S. (2006). A faithfulness ranking projected from a perceptibility scale: The case of [+ voice] in Japanese. Language, 536–574.
Keating, P., Cho, T., Fougeron, C., & Hsu, C.S. (2004). Domain-initial articulatory strengthening in four languages. Phonetic Interpretation: Papers in Laboratory Phonology VI, 143–161.
Kinoshita, K., Behne, D.M., & Arai, T. (2002). Duration and F0 as perceptual cues to Japanese vowel quantity. In Seventh international conference on spoken language processing.
Kleinschmidt, D.F., & Jaeger, T.F. (2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review, 122(2), 148.
pubmed: 25844873 pmcid: 4744792
Kubozono, H. (2002). Temporal neutralization in Japanese. In Laboratory Phonology 7 (pp. 171–2002). Cambridge: Cambridge University Press.
Kuhl, P.K., Williams, K.A., Lacerda, F., Stevens, K.N., & Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science, 255, 606–608.
pubmed: 1736364
Lehnert-LeHouillier, H. (2010). A cross-linguistic investigation of cues to vowel length perception. Journal of Phonetics, 38(3), 72–482.
Luce, P.A., & Charles-Luce, J. (1985). Contextual effects on vowel duration, closure duration, and the consonant/vowel ratio in speech production. The Journal of the Acoustical Society of America, 78(6), 1949–1957.
pubmed: 4078171
Ludusan, B., Cristia, A., Martin, A., Mazuka, R., & Dupoux, E. (2016). Learnability of prosodic boundaries: Is infant-directed speech easier? The Journal of the Acoustical Society of America, 140(2), 1239–1250.
pubmed: 27586752
Ludusan, B., Mazuka, R., Bernard, M., Cristia, A., & Dupoux, E. (2017). The role of prosody and speech register in word segmentation: A computational modelling perspective. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers), (Vol. 2 pp. 178–183).
Mann, V.A., & Repp, B.H. (1980). Influence of vocalic context on perception of the [[Formula: see text]]-[s] distinction. Attention, Perception, & Psychophysics, 28(3), 213–228.
Martin, A., Igarashi, Y., Jincho, N., & Mazuka, R. (2016). Utterances in infant-directed speech are shorter, not slower. Cognition, 156, 52–59.
pubmed: 27513869
Massaro, D.W., & Cohen, M.M. (1983). Phonological context in speech perception. Attention, Perception, & Psychophysics, 34(4), 338–348.
Maye, J., Werker, J.F., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82(3), B101–B111.
pubmed: 11747867
Mazuka, R., Igarashi, Y., & Nishikawa, K. (2006). Input for learning Japanese: RIKEN Japanese mother–infant conversation corpus. The technical report of the Proceedings of the Institute of Electronics. Information and Communication Engineers, 106(165), 11–15.
McMurray, B., & Jongman, A. (2011). What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations. Psychological Review, 118(2), 219.
pubmed: 21417542 pmcid: 3523696
Miller, J.L. (1981). Effects of speaking rate on segmental distinctions. Perspectives on the Study of Speech, 39–74.
Miller, J.L., Grosjean, F., & Lomanto, C. (1984). Articulation rate and its variability in spontaneous speech: A reanalysis and some implications. Phonetica, 41(4), 215–225.
pubmed: 6535162
Miller, J.L., & Liberman, A.M. (1979). Some effects of later-occurring information on the perception of stop consonant and semivowel. Perception & Psychophysics, 25(6), 457–465.
Miller, J.L., O’Rourke, T.B., & Volaitis, L.E. (1997). Internal structure of phonetic categories: Effects of speaking rate. Phonetica, 54(3-4), 121–137.
pubmed: 9396164
Minifie, F., Kuhl, P., & Stecher, E. (1977). Categorical perception of /b/ and /w/ during changes in rate of utterance. The Journal of the Acoustical Society of America, 62(S1), S79–S79.
Mintz, T.H. (2006). Finding the verbs: Distributional cues to categories available to young learners. Action Meets Word: How Children Learn Verbs, 31–63.
Monahan, P.J., & Idsardi, W.J. (2010). Auditory sensitivity to formant ratios: Toward an account of vowel normalisation. Language and Cognitive Processes, 25(6), 808–839.
pubmed: 20606713 pmcid: 2893733
Moreton, E., & Amano, S. (1999). Phonotactics in the perception of Japanese vowel length: Evidence for long-distance dependencies. In EUROSPEECH.
Mugitani, R., Pons, F., Fais, L., Dietrich, C., Werker, J.F., & Amano, S. (2009). Perception of vowel length by Japanese- and English-learning infants. Developmental Psychology, 45(1), 236.
pubmed: 19210005
Narayan, C. (2008). The acoustic–perceptual salience of nasal place contrasts. Journal of Phonetics, 36(1), 191–217.
Narayan, C. (2013). Developmental perspectives on phonological typology and sound change. Origins of Sound Change: Approaches to Phonologization, 128–146.
Narayan, C., Peters, A., & Woldenga-Racine, V. (2017). Fragile phonetic contrasts in longitudinal infant-directed speech: Implications for infant speech perception. In BUCLD 42: Proceedings of the 41st Annual Boston University Conference on Language Development.
Nearey, T. (1978). Vowel space normalization in synthetic stimuli. The Journal of the Acoustical Society of America, 63, 1.
Nearey, T. (1990). The segment as a unit of speech perception. Journal of Phonetics.
Newman, R.S., Clouse, S.A., & Burnham, J.L. (2001). The perceptual consequences of within-talker variability in fricative production. The Journal of the Acoustical Society of America, 109(3), 1181–1196.
pubmed: 11303932
Newman, R.S., & Sawusch, J.R. (1996). Perceptual normalization for speaking rate: Effects of temporal distance. Attention, Perception, & Psychophysics, 58(4), 540–560.
Pickett, J., & Decker, L.R. (1960). Time factors in perception of a double consonant. Language and Speech, 3(1), 11–17.
Pierrehumbert, J. (2002). Word-specific phonetics. Laboratory Phonology, 7.
Port, R.F., & Dalby, J. (1982). Consonant/vowel ratio as a cue for voicing in English. Attention, Perception, & Psychophysics, 32(2), 141–152.
Rakerd, B., Sennett, W., & Fowler, C.A. (1987). Domain-final lengthening and foot-level shortening in spoken English. Phonetica, 44(3), 147–155.
pubmed: 3452834
Richter, C., Feldman, N.H., Salgado, H., & Jansen, A. (2017). Evaluating low-level speech features against human perceptual data. In Transactions of the Association for Computational Linguistics.
Sato, Y., Sogabe, Y., & Mazuka, R. (2010). Discrimination of phonemic vowel length by Japanese infants. Developmental Psychology, 46(1), 106.
pubmed: 20053010
Sawusch, J.R., & Newman, R.S. (2000). Perceptual normalization for speaking rate II: Effects of signal discontinuities. Attention, Perception, & Psychophysics, 62(2), 285–300.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Shi, R., & Melançon, A. (2010). Syntactic categorization in French-learning infants. Infancy, 15(5), 517–533.
pubmed: 32693508
Shi, R., & Werker, J.F. (2001). Six-month-old infants’ preference for lexical words. Psychological Science, 12 (1), 70–75.
pubmed: 11294231
Shi, R., Werker, J.F., & Morgan, J.L. (1999). Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition, 72(2), B11–B21.
pubmed: 10553673
Strand, E.A., & Johnson, K. (1996). Gradient and visual speaker normalization in the perception of fricatives. Konvens, 14–26.
Summerfield, Q. (1981). Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human Perception and Performance, 7(5), 1074.
pubmed: 6457109
Swingley, D. (2009). Contributions of infant word learning to language development. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 364(1536), 3617– 3632.
pubmed: 19933136
Swingley, D., & Alarcon, C. (2018). Lexical learning may contribute to phonetic learning in infants: A corpus analysis of maternal Spanish. Cognitive Science.
Thiessen, E.D. (2007). The effect of distributional information on children’s use of phonemic contrasts. Journal of Memory and Language, 56(1), 16–34.
Todorović, D. (2010). Context effects in visual perception and their explanations. Review of Psychology, 17 (1), 17–32.
Toscano, J.C., & McMurray, B. (2012). Cue-integration and context effects in speech: Evidence against speaking-rate normalization. Attention, Perception, & Psychophysics, 74(6), 1284–1301.
Umeda, N. (1975). Vowel duration in American English. The Journal of the Acoustical Society of America, 58 (2), 434–445.
pubmed: 1184837
Vance, T.J. (1987). An introduction to Japanese phonology. SUNY Press.
van Heugten, M., & Johnson, E.K. (2014). Learning to contend with accents in infancy: Benefits of brief speaker exposure. Journal of Experimental Psychology: General, 143(1), 340.
Van Santen, J.P. (1992). Contextual effects on vowel duration. Speech Communication, 11(6), 513–546.
Verbrugge, R.R., Strange, W., Shankweiler, D.P., & Edman, T.R. (1976). What information enables a listener to map a talker’s vowel space? The Journal of the Acoustical Society of America, 60(1), 198–212.
pubmed: 956527
Warren, R.M. (1970). Perceptual restoration of missing speech sounds. Science, 167(3917), 392–393.
pubmed: 5409744
Wayland, S.C., Miller, J.L., & Volaitis, L.E. (1992). The influence of sentence articulation rate on the internal structure of phonetic categories. The Journal of the Acoustical Society of America, 92(4), 2465–2465.
Wayland, S.C., Miller, J.L., & Volaitis, L.E. (1994). The influence of sentential speaking rate on the internal structure of phonetic categories. The Journal of the Acoustical Society of America, 95(5), 2694–2701.
pubmed: 8207142
Werker, J.F., Pons, F., Dietrich, C., Kajikawa, S., Fais, L., & Amano, S. (2007). Infant-directed speech supports phonetic category learning in English and Japanese. Cognition, 103(1), 147–162.
pubmed: 16707119

Auteurs

Kasia Hitczenko (K)

Department of Linguistics, Northwestern University, 2016 Sheridan Road, Evanston, IL, 60208, USA. kasia.hitczenko@northwestern.edu.

Reiko Mazuka (R)

RIKEN Center for Brain Science, Wako, Japan.

Micha Elsner (M)

Department of Linguistics, The Ohio State University, Columbus, OH, USA.

Naomi H Feldman (NH)

Department of Linguistics and UMIACS, University of Maryland, College Park, MD, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH