Go Back

Shaping the lips, placing the tongue, positioning the jaw – how we produce the right speech sounds

February 5, 2016

An article that came out in The Atlantic last week described a curious phenomenon called “foreign accent syndrome”. This is a condition that occurs due to brain injury and as a consequence people who are, say, native English speakers and never knew how to speak another language, start speaking with a French or German or some other accent. Or at least, that’s the perception listeners get when they hear someone with this syndrome speak and it has mystified people as to how this could happen.

A study described in The Conversation has shed some light on this mystery. It demonstrated that when such newly acquired accents were analyzed by linguists, the ‘accent’ was in fact a mix of the way sounds were pronounced in multiple languages, it was not an accent specific to one or another foreign language. The “German accent” or “French accent” were just associations listeners had with a specific language.

What happens to people with foreign accent syndrome is that their brain injury causes them to produce the vowels and sometimes the consonants of their own language differently because they have damage in the brain areas that are important for producing sounds with the right rhythm and timing. Some of the damaged brain areas are responsible for controlling the muscles that move the tongue, lips, and jaw (collectively called speech articulators). Many of these areas are located on the surface of the brain along regions responsible for motor and sensory functions (the two vertical red stripes of the brain shown in the image below).

This means that the brain has done all the work to form a message, find the right words that will express it, figure out the grammatical rules that will put the sentence together in a meaningful message, and now that it’s ready to issue the commands to the articulators to start speaking – there is a glitch in the machinery that sends instructions to the muscles, and the sounds come out wrong.

Controlling the articulators is no small challenge. To say a longer word such as extraordinary, the brain has to issue tens of thousands of commands to the muscles controlling the tongue, lips, and jaw and all these instructions have to arrive at a very precise time in order for the vowels and consonants to sound correctly, and not, for example, be perceived as an accent. As Lyndsey Nickels, a Cognitive Neuroscientists at the Macquarie University, writes for The Conversation:

Speaking requires very precise control of the muscles of lips, tongue and jaw (the speech articulators) and the larynx (voicebox). If the placement of the articulators, speed or coordination of movements are slightly out of sync, then speech sounds will be altered.

Vowels are particularly susceptible: which vowel you say depends on where your tongue is in your mouth. Slight differences in where your tongue is – how far forward or back, how high or low in your mouth – changes the vowel you produce. Different languages have different vowels and within a language one of the main differences between accents is in the vowels.

Why are subtle changes in the positioning of the tongue and the lips more noticeable for vowels than for consonants? Frank Guenther, a computational neuroscientist and a professor of speech, language and hearing sciences at Boston University explains:

Consonants tend to be perceived categorically: we hear either “p” or “b” but can’t tell the difference between a bad “b” and a good “b”. With vowels, however, we can hear fairly subtle within-category differences, so that an “oo” with the tongue position slightly off will sound different than a good “oo”.

Categorical perception is something our brain learns to do with speech sounds to make its life easier. We all speak slightly differently and the brain needs to accommodate such differences to make it possible for us to understand each other. A lot of the work for sound perception is done by areas along the horizontal red stripe in the brain image above, known as superior temporal regions.

To give a more specific example of categorical perception for consonants, Frank Guenther explains the case of why we hear the difference between sounds like ‘p’ and ‘b’, when the lips and the tongue seem to be in the same position for both, and why it is harder to tell the difference between a good ‘b’ and a bad ‘b’ than it is for vowels.

“p” and “b” are stop consonants that both involve lip closure and then opening for the upcoming vowel. The difference between “p” and “b” is one of voice onset time – for “p”, we hold off on vibrating our vocal folds for about 50 ms or more after our lips move apart, but for “b” we start vibrating the vocal folds immediately after the lips separate. When we are listening to a “p” or “b”, any voice onset time between 0 and about 40 ms will sound like a “b”, and we can’t tell the difference between any of them. Anything with a voice onset time above 40 ms will sound like a “p”, and again we can’t differentiate them. That’s categorical perception.

Speaking is one of the most demanding cognitive and motor tasks that the brain performs on daily basis, and the motor aspect of it is only a small part of what the brain has to accomplish in order to get that thought out as a spoken message. And yet, even this last step of the process requires so much fine coordination among multiple brain regions. It is no wonder then that speech is so often affected in some way or other when brain damage occurs. Understanding better the details of this exquisitely orchestrated performance is important if we want to find better treatments for speech disorders.

Skip Links

Shaping the lips, placing the tongue, positioning the jaw – how we produce the right speech sounds

Share

Stay Informed, Join Our Newsletter

Announcing the 2024 Barbara Martin Aphasia Research Grant

Night of Aphasia Arts 2024

What Did the NAA Accomplish in 2023?