Neural Basis of Vocal Communication

Neurobiologist Sophie Scott on human speech, exaggerating gender differences, and the perception of voice information

videos | June 12, 2017

One of the interesting things about humans is when we start talking to each other, we get speech. And human speech is amazing and in my area, sort of the domain of psychology and cognitive neuroscience, we were interested in humans and how they behave. What we’ve done historically is we’ve looked at people talking to each other and we said: that is speech and language and we’ve studied the speech and language.

But of course, as soon as somebody starts talking to you, you’ve got all other information there. So if you couldn’t see me, you could only hear my voice, you would be able to tell if I was a man or a woman, you would be able to have a good guess at my age, if you were a good speaker of English you would be able to spot where I come from in the UK, you’d be able to tell if I was unwell, you would be able to tell if I was in a bad mood or not, and if you knew me, you would be able to recognize me.

Neuropsychologist Christ Frith on mirror neurons, perception of biological motion, and mentalizing
All of that information is encoded in our voices and expressed in our voices at the same time as the words that we’re saying. So in fact, what we see when we look at the brain processing spoken language isn’t just a neural response to the words that somebody says. It’s also the brain is very interested in the fact that somebody is speaking. And of course, a very basic aspect of speech if you’ve always got speech melody. All languages use the melody of the voice to give emphasis, or to ask questions, or to do all sorts of different aspects. We-don’t-talk-to-each-other-like-this-we-would-consider-it-to-be-very-strange-if-we-did. You use don’t find languages that do that.

So you’ve even got the sort of musical signal and that seems to be processed differently from the processing of speech up to a point. The brain networks which are involved in processing the words that somebody says are largely found in the left hemisphere, the left side of the brain cares about the linguistic information in speech, from speech sounds, through to words, through to sentences. In contrast, the right side of the brain auditory areas on the right really like speech, but they don’t particularly care about spoken words, what they care about is all that other information that’s in there: it cares about melody, cares about speaker identity, cares about emotion in the voice.

It seems that for the early perceptual processing of somebody talking to you what the brain does is it separates out the linguistic stuff from all the other stuff that’s in the voice, and it’s processing the two separately. At some point, this gets brought back together, because, for example, if you know a talker, you find them easier to understand than if you don’t know them. So that that familiarity is with their voice and that’s helping you understand what they’re saying, so the left and the right must be talking to each other.

But it does seem that in terms of early perceptual processing it’s done differently, and you can see that in patients. So patients who have left temporal lobe lesions have great difficulty understanding the words of what somebody said. Patients with right temporal lobe lesions have difficulty recognizing people from their voice. That does seem to genuinely be a difference.

And what this is leading us to is an understanding pf a lot more about the voice, than we’ve really paid much attention to in our area. So we’ve typically concentrated on linguistic information. What we’re really interested in now is how do you combine that with all the other stuff that’s going on and actually how does it get changed when you’re speaking as well. Because actually we produce this very kind of complex sound which we think of as involving all the linguistic information, but also I’m talking to you in a controlled way that has got certain characteristics.

For example, in the West women have lowered the pitch of their voices over about the last 40 years. If you go back to TV programs and radio programs from the 1960’s women’s voices up a pitch quite high, that’s considerably higher I’m talking to you earlier when I’m pitched down here. And that’s come down, women are using lower pitched voices. It seems to reflect women coming into the workplace. There’s lots of stuff I can’t change, but they can change their voices. If you go to other parts of the world, you’ll find that that’s not the case, so in Japan women speak with higher pitches than women in the UK, and men speak with lower pitches than men in the UK. In Japan people are exaggerating the difference between men and women with their voices. In their vocal communication they are pulling out this difference, in the West we are minimizing it.

Neuroscientist Mahzarin Banaji on the role of functional MRI in social neuroscience, ways our brain perceives social world, and the origins of human consciousness
It’s very hard to say for sure what should be the natural aspects of your voice, because some of the aspects of your voice are driven by your body. So men have got on the hole deeper pitched voices than women, because in adolescence the larynx, the voice box, drops down in boys and not in women. It happens around menopause for women, and it happens more slowly.

What men have is a lower vocal box and their voice boxes lower, and it’s actually bigger as well it’s got bigger thicker vocal folds that make the sound. Just like a bigger musical instrument with sticker strings, men’s voices have got a bigger spectral range and they produce a deeper pitch. That’s a physical reason why men and women sound different. You can also have sort of pathological aspects or you maybe have a problem with your vocal folds that mean that you sound different. There’s a physical aspect. And that’s why if you’re unwell you can hear it in the voice, because it’s physically affecting how you sound.

But then a lot of the other things that affect how you sound have more to do with how you would like people to see you. And that’s like women dropping the pitch of their voices in the West, or me talking to you with the accent that I’m using now. I’m originally from the north of England, I had a strong accent when I was growing up. Since I became an academic, I’ve largely dropped that when I’m doing academic work, because I want to sound like a professor, not like somebody with a broad regional accent. I’m not saying I’m right, but that’s what I’m doing.

Even down to really personal things, so if I’ve been talking on the phone to my mom, I talk like her for about half an hour afterwards, my partner can tell that I’ve been talking to my mom. Because I change my voice to be like her. I can’t be with her, not for sinister reasons, she just lives abroad, we’re not together, but I can take my voice over to her and show all this stuff about relationship with how I’m changing my voice.

So it’s reflecting the physical reality of your body. And for example, that’s why aging changes of your voice. Men’s voices tend to go higher in pitches as they can older. Women’s voice tends to get lower in pitches as they get older because of physical changes. You definitely have something that is arising from your body as a result of what your body is like, but then you would always kind of aspirational and social things that affect how you talk, and it happens all the time. So in fact, our voices change continuously. I’m talking to you differently right now than if I was trying to discipline my son.

I think some of the main questions around the neural basis of vocal communication are how does it tie up with everything else, because it’s not separate. If I change the intonation in my voice, those mean very different things. The words are the same and I’ve made a difference with the vocal aspects of what I’m doing. How do you tease that apart, how does that information get drawn together? I think also a real challenge is how we start to deal with something that’s so variable. Given that people talk differently all the time, even down to the acoustic properties of the room that you’re in change how you’re speaking. How do we start to even represent that? Because we are used to things: when you learn this language and now you are speaking using this language, we’re not used to having to try and model phenomena that vary so much. That gives us a real practical problem. How do we understand voices when voices can be so variable?

So at the moment, we’re very interested in several different aspects of voices. We trying to get to grips both with how our brains underpin our voices, both in terms of perception – how do we get this information out of the voice, and also in production, how do our brains control the ways that we sound, and how that varies. There’s a really big interest in that. There’s also a growing interest in the sort of social meanings and understandings of voices, because we will interpret stuff about how somebody talks in a way that might or might not be incorrect, but can mean that we can make judgments, for example, about status and power from the voice. How are we doing that? Why are we doing that? what influences that?

Linguist Maria Polinsky on polar questions, strategy of asking in different languages, and cross-linguistic understanding of WH questions
You can also see a growing interest in understanding how this can go wrong, because historically we’ve tended to deal with somebody who’s got a speech production problem by saying, ‘let’s look at how you use language’. And of course, they also have a voice problem. And very often it’s the vocal aspects of what’s difficult for them. They don’t sound the way they used to, they would like to sound the way they used to, they’d like to sound like they come from where they come from in the country, for example.

How do we start to understand that kind of emotional connection to the voice and sounding the way you do? Because as voices are often aspirational, that means it’s a very personal thing. And sort of understanding that social and emotional connection to voices, I think it’s going to be very interesting for actually understanding a lot of how we use our voices in the world.

I think, probably, in the future of this area it’s going to be understanding how things vary and starting to get to grips with that, and also the kind of implications of that. We’ve got this highly dynamic social tool which we happen to use for expressing language, but it’s got all this other information going on in it. And I think understanding that, and actually thinking of ways that we could use that information. For example, historically it’s been very difficult to recognize somebody from their voice. You can get a thumbprint from somebody and you’ll hear people use the phrase voiceprint, but actually it’s really difficult. In law it’s very hard to take a piece of recording and say this is that person, because people disguise their voices. Particularly if they wish to mislead you.

Kind of understanding the variation and how and why voices vary, that might start to tell us a bit more about what doesn’t vary. But should we look for if you want to say these two voices are the same and these two voices are different. How can we even start addressing that question?

Wellcome Senior Research Fellow in Basic Biomedical Science; Professor of Cognitive Neuroscience, Institute of Cognitive Neuroscience, University College London
Did you like it? Share it with your friends!
    Published items
    To be published soon

    Most viewed

  • 1
    Tim Reynolds
  • 2
    Peter Kivisto
  • 3
    Simon White
  • 4
    Greg Towers
  • 5
    Sophie Scott
  • 6
    Tristram Wyatt
  • 7
    David Rubinsztein
  • 8
    Joanna D. Haigh
  • 9
    Michael Thomas
  • New