Speech Perception

 

I. The Goal of Speech Perception

 

     A. Understanding what the speaker intended to say.

          1. May be remembered in terms of meaningful words and in

               the phonemes for the words

          2. May be remembered in terms of "phonetic gestures"

          3. May be remembered as a little of both 1 and 2

 

     B. Phoneme

          1. Definition = the smallest sound in a spoken word which

               can be used to distinguish one word from another.

          2. If changed, would alter the meaning of the word

          3. Defined in terms of their sounds

          4. Example: character  = /k/ /a/ /r/ /a/ /c/ /t/ /u/ /r/


C. Phonetic Gestures (Features)

     1. Definition = Physical movements of the vocal tract that

          accompany the production of the phoneme

     2. Figure

     3. No two phonemes have the same phonetic features (each is

          produced in a different way)

 

II. Two Basic Types of Speech Sound

 

     A. Consonants:

          1. Produced by pushing air through a constricted or closed

               vocal tract

          2. Three types of features allow for the production and

               identification of consonants

              a. Voicing

                   1) Vocal cord vibration?

                   2) Voiced or Unvoiced: bin - pin

              b. Manner of Articulation

                   1) How much is air flow stopped?

                   2) Types         

                        a) Stop Consonants

                             i) Complete stop

                             ii) Example: /k/, /p/, /t/

                        b) Fricatives

                                 i) Nasal stop, mouth restricted

                                 ii) Example: /z/, /s/, /f/

                        c) Nasal Consonants

                             i) Nasal stop, mouth stop

                             ii) Example: /m/, /n/

              c. Place of Articulation

                   1) Where is the air flow stopped?

                   2) Places

                        a) Lips: /p/

                        b) Teeth: /f/, /th/

                        c) Alveolar Ridge: /t/

                        d) Palate: /ch/

                        e) Velum: /k/

                        f) Glottal: /h/

 

     B. Vowels:

          1. Produced with an open vocal tract

          2. State of the vocal tract determines which vowel is

               produced – mainly the tongue and lips

              a. Highest Part of the Tongue

                   1) Front: hid

                   2) Middle: carry

                   3) Back: root

              b. Height of tongue

                   1) Low: fat

                   2) Medium: late

                   3) High: treat

 

              c. Rounding of Lips

                   1) Unrounded: he

                   2) Rounded: who

 

III. Speech Spectrogram (Sound Spectrogram)

 

     A. Definition = a diagram which shows what frequencies are

          present in a spoken word and how strong the frequency

          components are.

          1. Pattern of frequencies and intensities over time

          2. Messy looking for natural speech, but clearer if you

               examine a syllable spoken all by itself

          3. Some patterns do exist if you look hard enough

 

     B. Formants

          1. Seen as a dark horizontal band in a speech spectrogram

               which represents a prominent band of frequencies in the

               utterance.

          2. A speech sound can be composed of several formants.

          3. Formants are numbered from low frequency to high

               frequency.

              a. The formant with the lowest frequencies is the first

                   formant

          4. The first two formants are the most important ones for

               speech perception. 

          5. Vowel sounds are generally represented by formants that

               look like flat bars on a spectrogram.

          6. Consonant sounds are very variable in appearance.


C. Coarticulation/Parallel Transmission

          1. Spoken phonetic segments overlap

          a. "dog" vs. "dad" the /d/ sound changes depending upon

               the vowel which follows it

          2. Parts of many phonemes may be simultaneously present

               during the production of a word.

         

          3. Formant transitions are the shift between the vowel and

               consonant

          4. This makes it difficult to associate particular sounds with

               perceived phonemes.


 

IV. Theories of speech perception

    

     A. Motor Theory of Speech Perception

          1. Speech is special.

          2. Humans have a specialized neural structure that allows

               them to decode speech into phonetic gestures.

              a. Because we both perceive and produce speech, it makes

                   sense to have one mechanism to handle both

              b. Link between vocal tract, tongue, and perception of

                   speech

          3. We are born with this neural structure.

          4. Main proponent is Liberman 

 

     B. General Mechanism Account

          1. Speech is not special.

          2. We learn to understand and perceive speech just like any

               other sound.

          3. We are not born with a neural structure designed

               specifically for speech perception.

 

V. Categorical Perception

 

     A. Definition = organizing stimuli into groups with the

          consequence that although we can distinguish between

          groups we cannot distinguish items within a group.

    

     B. We categorize speech perception in a limited way

    

     C. Almost everyone performs the same on these tasks

 

VI. McGurk effect

 

     A. Auditory-visual phenomenon that occurs when an observer

          1. Hears an audio tape of one phoneme

          2. Simultaneously sees a silent video tape of a speaker

               saying a different phoneme

          3. The observer reports hearing a completely different

               phoneme that is actually a compromise of the two.

 

     B. Example:

          1. What is seen: "Ga" - place of articulation: back of the

               palate

          2. What is heard: "Ba" - place of articulation: lips

          3. What is perceived:

              a. A compromise between the visual information and the

                   auditory information

              b. "Da" - place of articulation is between the back of the

                   palate (what is seen) and the lips (what is heard) - it is

                   the alveolar ridge

 

VII. Development of speech perception

 

     A. Categorical Perception has been demonstrated in infants as

          young as 1 month old.

          1. Babies distinguish between phonemes better than adults.

          2. Example:

              a. Two 't' sounds of Hindi were distinguishable by all 6-8

                   month olds

              b. The two sounds were also distinguishable by 1 year olds

                   whose parents spoke Hindi

              c. The sounds were not distinguishable by 1 year olds

                   whose parents did not speak Hindi

 

     B. Effect of mode of language (i.e. visual vs. auditory language)

          1. At 7-10 months, virtually all hearing babies babble.

          2. Deaf infants raised by parents whose only form of

               linguistic communication was American Sign Language

               babble with their hands and fingers.

 

1