Brain Mechanisms Of Pitch Perception In Speech

Neurons in the human brain that respond to pitch changes in spoken language have been identified by researchers at UC San Francisco. These pitches are essential to clearly conveying both meaning and emotion.

Changes in vocal pitch during speech – part of what linguists call speech prosody – are a fundamental part of human communication, nearly as fundamental as melody to music. In tonal languages such as Mandarin Chinese, pitch changes can completely alter the meaning of a word, but even in a non-tonal language like English, differences in pitch can significantly change the meaning of a spoken sentence.

For instance, “Sarah plays soccer,” in which “Sarah” is spoken with a descending pitch, can be used by a speaker to communicate that Sarah, rather than some other person, plays soccer; in contrast, “Sarah plays soccer” indicates that Sarah plays soccer, rather than some other game. And adding a rising tone at the end of a sentence (“Sarah plays soccer?”) indicates that the sentence is a question.

A Remarkable Ability

The brain’s ability to interpret these changes in tone on the fly is particularly remarkable, given that each speaker also has their own typical vocal pitch and style (that is, some people have low voices, others have high voices, and others seem to end even statements as if they were questions).

Not only that, but the brain must track and interpret these pitch changes while simultaneously parsing which consonants and vowels are being uttered, what words they form, and how those words are being combined into phrases and sentences — with all of this happening on a millisecond scale.

Previous studies in both humans and non-human primates have identified areas of the brain’s frontal and temporal cortices that are sensitive to vocal pitch and intonation, but none have answered the question of how neurons in these regions detect and represent changes in pitch to inform the brain’s interpretation of a speaker’s meaning.

The study was conducted by the lab of Edward Chang, MD, a professor of neurological surgery at the UCSF Weill Institute for Neurosciences, and led by Claire Tang, a fourth-year graduate student in the Chang lab.

Chang, a neurosurgeon at the UCSF Epilepsy Center, specializes in surgeries to remove brain tissue that causes seizures in patients with epilepsy. In some cases, to prepare for these operations, he places high-density arrays of tiny electrodes onto the surface of the patients’ brains, both to help identify the location triggering the patients’ seizures and to map out other important areas, such as those involved in language, to make sure the surgery avoids damaging them.

The Superior Temporal Gyrus

In the new study, Tang asked 10 volunteers awaiting surgery with these electrodes in place to listen to recordings of four sentences as spoken by three different synthesized voices:

“Humans value genuine behavior”
“Movies demand minimal energy”
“Reindeer are a visual animal”
“Lawyers give a relevant opinion”

The sentences were designed to have the same length and construction, and could be played with four different intonations: neutral, emphasizing the first word, emphasizing the third word, or as a question.

You can see how these intonation changes alter the meaning of the sentence: “Humans [unlike Klingons] value genuine behavior;” “Humans value genuine [not insincere] behavior;” and “Humans value genuine behavior?” [Do they really?]

Tang and her colleagues monitored the electrical activity of neurons in a part of the volunteers’ auditory cortices called the superior temporal gyrus (STG), which previous research had shown might play some role in processing speech prosody.

They found that some neurons in the STG could distinguish between the three synthesized speakers, primarily based on differences in their average vocal pitch range. Other neurons could distinguish between the four sentences, no matter which speaker was saying them, based on the different kinds of sounds (or phonemes) that made up the sentences (“reindeer” sounds different from “lawyers” no matter who’s talking).

And yet another group of neurons could distinguish between the four different intonation patterns. These neurons changed their activity depending on where the emphasis fell in the sentence, but didn’t care which sentence it was or who was saying it.

Predicting Neural Response

To prove to themselves that they had cracked the brain’s system for pulling intonation information from sentences, the team designed an algorithm to predict how neurons’ response to any sentence should change based on speaker, phonetics, and intonation and then used this model to predict how the volunteers’ neurons would respond to hundreds of recorded sentences by different speakers.

They showed that while the neurons responsive to the different speakers were focused on absolute pitch of the speaker’s voice, the ones responsive to intonation were more focused on relative pitch: how the pitch of the speaker’s voice changed from moment to moment during the recording.

“To me this was one of the most exciting aspects of our study,” Tang said. “We were able to show not just where prosody is encoded in the brain, but also how, by explaining the activity in terms of specific changes in vocal pitch."

These findings reveal how the brain begins to take apart the complex stream of sounds that make up speech and identify important cues about the meaning of what we’re hearing, Tang says. Who is talking, what are they saying, and just as importantly, how are they saying it?

“Now, a major unanswered question is how the brain controls our vocal tracts to make these intonational speech sounds. We hope we can solve this mystery soon,"

said Chang, the paper’s senior author.

C. Tang, L. S. Hamilton, E. F. Chang
Intonational speech prosody encoding in the human auditory cortex
Science 25 Aug 2017: Vol. 357, Issue 6353, pp. 797-801 DOI: 10.1126/science.aam8577