Stanford University

News Service



Elaine Ray, News Service (650) 723-7162

Computers with voices: Students explore how people respond

If you thought "designer babies" were a long way off, consider the population explosion of "virtual" people being conceived now -- not from genes but from electronic parts. How will these characters who inhabit our computers and televisions, toys and other computerized products look, sound and move their body parts?

Clifford Nass, seated, is shown with two student researchers, Amy Huang, left, and Seema Swamy, right, who were part of a quarter-long course in which students completed 10 experiments on how people respond to speech interfaces on computers. A web-based research tool, the CSLU Toolkit, developed at the Center for Spoken Language Understanding at Oregon Graduate Institute of Science and Technology (, made it possible to complete the experiments so quickly

Photo by Linda Cicero

Thirty-five Stanford students supervised by communication Professor Clifford Nass, in a course sponsored by the National Science Foundation, decided to find some of the answers last quarter by testing how 1,000 real people responded to virtual ones -- especially to their voices. Based on the assumption that virtual characters will have to be attractive to real people in order to survive in a free market, the students began the design of their experiments by reviewing research on psychology and how the brain processes speech. Nass encouraged them to ask, "If something works for humans, what about for synthetic and recorded speech?"

Some of the graduate/undergraduate research teams investigated people's reactions to representations of human faces on computer screens, and one team looked at responses to a computer that "touches" its users through a joystick. The majority of the 11 teams chose to test virtual voices, or the hot new technology known as VUI -- voice user interface. VUIs range from poor-quality machine-generated voices, which many companies use on their telephone answering equipment, to high-quality human-recorded voices, which are more expensive to produce but which are beginning to crop up on commercial websites. These voices can be employed to respond to consumer questions about investing in the stock market or how to set up the computer they just bought.

When subjects saw these synthetic faces on computer screens coupled with a human-sounding voice, they gave less personal information about themselves than when they just heard the voice. They revealed the most information, however, to computers that simply presented questions in text. The experiment suggests that the more human-like the interface, the greater desire humans have to manage themselves.

Courtesy of U.C. Santa Cruz

Voice interface technology has improved incredibly rapidly, so that companies like Philips and Microsoft are talking about imbedding speech into almost everything," says Nass, the co-author of The Media Equation, a 1996 book on people's social responses to communicating technology. Venture capitalists, he adds, also have been funding speech interface start-up companies with names like TellMe, BeVocal and Quack. "Yet there has been almost no research on the psychology of design of speech interfaces," he said, which is why about six dozen U.S. and European-based companies sent product designers or researchers to campus in June to hear the students present their sometimes surprising results.

So what will virtual people be like? First, their gender and ethnic "background" is not likely to be accidental or even representative of the human population. That is because designers are not likely to ignore what the students learned -- that gender and ethnic stereotyping, often subconscious, is pervasive when people encounter voice interfaces.

The experimenters also found they could manipulate people's attitudes toward the content of messages by changing the emotional tone of voice, as well as physical parameters such as pitch and speed.

Voice interfaces, the students also found, may not always be preferable to the text interfaces to which computer users are now accustomed.

Gender/ethnic stereotyping

The research subjects reacted more positively to virtual male voices than to virtual female voices in several experiments, as the researchers predicted.

"Gender is the first social attribute people recognize in a human voice, and it triggers stereotypical reactions, so that male voices are perceived as more assertive, ambitious and persuasive," said Eun-Ju Lee, a student whose research team conducted an experiment in which they found that even obviously synthetic voices "elicited gender stereotyping and gender identification. Casting is crucial," she said.

But while male voices are more highly respected by men and women, psychologists also know from past research that human voices can generate "in-group favoritism." Another student research team looked at what type of voice would prompt people to disclose the most personal information to a computerized interface. They found that their American male subjects -- Stanford students -- were willing to disclose more personal information to user interfaces that spoke in a female, foreign-accented voice in this case, Swedish. American females, on the other hand, revealed more personal data to an American-accented female voice than to either a male American voice or Swedish voices of either gender.

Seema Swamy, one of the researchers who conducted the experiment, concluded that men are more likely to disclose personal information to a voice that they feel "socially distant from and are not likely to meet again. Women may feel more comfortable sharing information with someone whom they consider more like themselves." Women said they like the speech interfaces better, but men disclosed more information, she said.

Companies could design different speech interfaces for men and women consumers, she noted, because they would have to do only two designs. Playing to people's nationality and racial stereotypes, however, would be more difficult, at least for products distributed globally.

When voices and faces intimidate

In another experiment, students tried to see how much personal information they could get subjects to disclose when voices were combined with representations of faces on screens. They found that a synthetic face coupled with a human-sounding voice decreased people's willingness to respond "yes" to such invasive questions as "Do you sometimes tell lies if you have to?" Research subjects disclosed the most to text interfaces. Apparently, the more human-like the interface, "the greater desire humans have to manage themselves," said student researcher Li Gong. "The synthetic face made people spend less time answering the questions."

Yet another team found that voices obviously generated by machine probably should not try to claim they are human. Setting up an over-the-phone auction, the researchers tested human-recorded and machine-generated voices, each offering the same items for sale with the same language, except in two styles of grammar.

In one condition, the voice referred to itself with a personal pronoun; for example, "The next item I am offering for sale is a futon." In another condition, the voice used a passive construction -- "The next item for sale is a futon" -- in order to avoid referring to itself as if it were a person. The research subjects perceived the recorded voice using personal pronouns to be the most "sociable and spontaneous" of the four conditions, researcher Francis Lee said, and they found the machine-generated voice using the passive voice to be the most "formal and fair."

"It's nice if they feel good about the voice, but you really want them to buy from an auction site," said researcher Luke Swartz. The research subjects bid higher amounts for the items offered by the "formal and fair" voice, he said.

"In a different context, such as a voice offering driving instructions or a weather report," he said, "people may prefer the more personal, human-sounding voice."

Can voice trump content?

Several experiments looked at how to manipulate people's perceptions of the content of messages. In one, researchers Kyu Hahn, Sylvia Loveda, Rob Baesman and Sandra Lui took four current events stories that mingled factual and opinionated content and told listeners they were listening to either "news" or an "editorial" on "NetRadio."

For stories labeled as news, the human-recorded voice made the content seem more factual and persuasive than the machine-generated voice. For stories labeled as editorial, the reverse was true. The results suggest that content labeling primes the audience and that listeners are seeking some sort of balance, the researchers said. News may seem "a little boring" in the machine-generated voice, but opinion was probably perceived as less opinionated when spoken by the less human voice.

In a related experiment, another research team found that people ascribed emotions to machine voices; this influenced the credibility of the message. Research subjects liked happy news or movie reviews better when read by a happy voice and bad news and reviews better when read by a sad voice, but they gave more credibility to the report when the voice didn't match the content.

This might pose a difficult trade-off for a website like Charles, said student researcher Michael Somoza. Investors may not like to hear a happy voice reporting a downturn in stock prices, but they would probably believe it more than from a sad voice.

"We think the mismatch conveys a lack of [self-]interest, and therefore people perceive it as less biased," said student researcher Ulla Foehr.

Nass said the finding was especially robust, and similar effects were found in other student experiments. "Credibility seems to have a social, rather than a cognitive, explanation," he said.

In another experiment, students Scott Brave and Erenee Sirinian wanted to find out how people responded to simulated touch, because touch is a powerful component of interpersonal communication. They designed a computer-based maze in which research subjects received suggestions from a fictitious participant in another room. Some received the feedback in the form of on-screen arrows, while others were given a slight shove through the joystick they were using to play the game. Participants who received the touch feedback found the activity more fun and arousing. They also were more competitive and less cooperative than those getting on-screen feedback. There was no difference, however, in either group's performance of the task.

Ethical uses of technology

In presenting their material to industry representatives, the students were asked to conclude with the product design implications. Several teams therefore concluded that designers should take advantage of existing group stereotypes and that companies seeking more information on their customers should choose interfaces that maximized the amount of information people will give. Those conclusions may raise ethical dilemmas for some.

Nass and his Stanford colleague communication Professor Byron Reeves have "identified this important phenomenon, which is that people have psychological reactions to machines," says Batya Friedman, a professor of information and computer science at the University of Washington in Seattle. "They have also demonstrated that it affects behavior in a wide range of situations. We need to know more about people's perceptions before we can say if they are really making an attribution of agency [to the machine]," she said. In the meantime, "designers need to decide how to use this knowledge, and that is where ethical issues arise."

Nass previously has shown, for example, that a computer that flatters a user will be liked better by that user than one that doesn't flatter, but does that mean computers should flatter? Friedman asks. "If I write something flattering with a pencil on paper, I don't want the person reading it to think, 'Oh, that's just the pencil lead doing the flattering.' One could argue that what we really want to do is try to strip out as many of these personification cues as possible so people can find other people in the network."

Nass agrees that the research has not pinpointed exactly what people mean when they attribute qualities to a computer agent. "There's a whole area of psychological research on labeling into which stereotyping falls. When I label a skin color as good or bad, it's clear it applies to a person. When it's pictures on a screen, it is unclear what the label points to.

"I think it's important for everyone to know that stereotypes are happening," he said. "One choice [for businesses] is to say 'OK, I'm going to play into the stereotypes' -- making all auto mechanic websites have male voices -- which has the advantage to me of selling more."

On the other hand, it is illegal and presumably culturally unacceptable in the United States for employers of real people to hire only males as auto mechanics. Nass said he often advises industry representatives to expect a backlash that could lead to consumer boycotts or government regulation if they use new media to expand on existing stereotypes.

"The television industry went through this. At first when there were black characters on TV, they were Amos and Andy, and the argument was that people liked seeing African Americans portrayed as idiots. But then the social climate changed, people protested, and the industry created standards boards out of fear they would be regulated if they didn't."

The technology also could be used to undermine social stereotypes, he said, particularly since "the average person has more exposure to most occupations through media than in real life. If it happens in real life, we draw conclusions about media, but the reverse is also true." Children's book authors, he notes, have decided to include some women characters in non-traditional roles and characters from diverse ethnic backgrounds. So far, he said, nearly all human characters on U.S. Internet sites are white, although both genders are represented.

How about using interfaces to manipulate people into giving more personal information?

"It is important for societies to know what is going on within them," Nass said. "We don't need to link answers with particular individuals" about such sensitive subjects as drug use or unsafe sex practices, he said, but the information can be critical for establishing social policies. Businesses, however, frequently want to match information with specific customers to aid marketing.

"This raises a question about informed consent," says Stanford philosophy Professor Debra Satz, who directs the university's Ethics in Society Program. "Do people know what they are contributing to? Is it made explicit so at least they have a choice?"

In the 1970s before VUIs were a reality, Satz points out, MIT artificial intelligence researcher Joseph Weizenbaum conducted an experiment in which people typed information about their personal problems into a computer keyboard and a response came back on the screen. "They thought a therapist was typing on the other side of a wall when, in fact, the machine was programmed to pick up keywords and respond," Satz said.

Participants evaluated the therapist afterward as very good, she said. "Weizenbaum concluded that we can build machines that mimic humans but we should not, because of the ethical deception involved."

Satz said she agreed with Nass that "it is useful to know the facts -- that people respond more to one type of voice than the other. There have been experiments that show people are more likely to respond to a call for help from a white person's voice than a black person's voice. Maybe that is a fact about us, but it should bother us. . . . The question becomes, what do we want to do with that knowledge? To what extent do we want to consciously contribute to existing stereotypes that we know are out there and that explain a lot of human behavior, or to what extent do we want to use this knowledge to open more doors of opportunity?"

Friedman of the University of Washington points out that "when you design anything, you have done something that contains values, whether you are conscious of them or not." Her research is in the area of "value-aided design," which involves "making these issues visible and offering people tools they can use for analysis and alternative design possibilities." She is working, for example, with colleagues at Princeton on designs for network browser security systems -- work that involves asking web users about their desires for anonymity and accountability on the Internet. Preliminary results indicate that most people would like to maximize both but are more concerned with anonymity, she said. "It's important to recognize that in design, you usually don't get everything you want and you have to trade off one value against another."

Since people pay more attention to a male voice, she said, designers might consider it appropriate to use a male voice for a subway warning system about stepping too close to the edge of the platform.

"The downside is gender stereotyping, but in this case someone's physical welfare is at stake."


By Kathleen O'Toole

© Stanford University. All Rights Reserved. Stanford, CA 94305. (650) 723-2300. Terms of Use  |  Copyright Complaints