Stanford University

News Service



Kathleen O'Toole, News Service (650) 725-1939; e-mail:

Smarter houses, dictionaries, websites take people's ways into account

Like millions of consumers, Stefan Kaufmann started his shopping trip on the web. Looking for ideas on chairs, he typed "chair" into the search box for a newspaper database and produced what most search engines won't a list of interior decorating articles that included some that contained the word chair but others that contained the words "rocker," "chaise" or "couch."

Hoping for more good luck, Kaufmann went to a 10-year old Associated Press database. Asking again for articles about chairs, he retrieved articles this time with these headlines: "Pipe-Bomb Killer Dies Without Seeing Execution Chamber" and "Murderer Who Stole Christmas Presents Executed."

Therein lies one lesson from researchers at Stanford's Center for the Study of Language and Information (CSLI): You can tell words by the company they keep, but they hang out with different crowds in different databases.

Kaufmann, a graduate student in linguistics, is using that reality of natural language to develop better data mining techniques. (For a demonstration see

His was one of several dozen research projects discussed or demonstrated at a conference for the center's industrial affiliates at Cordura Hall Nov. 10-12. More than 60 industrial researchers from companies and government labs in the United States, Asia and Europe attended. Kaufmann's concept-based information retrieval method already has been applied by one of the companies to a commercial product but promises to underlie more. Kaufmann said he has a lot of work left, but he has begun to show that he can train this data miner to be more bilingual than an bilingual dictionary. Using a statistical method that finds how often words co-occur in a database record, Kaufmann trained his tool on Japanese and American databases of patent applications by grouping words of both languages into the same set of "concept" boxes. Later searches for a specific type of patent in either database found the correct patent in 44 of 45 tries, compared to only three of four tries when words were simply translated from one language to the other. The technique helps deal with changing jargon and slang, but is especially important to cross-language mining, he said, because words display more ambiguity across languages.

The project is one of several in the computational semantics laboratory led by linguistics Professor Stanley Peters. Another involves developing Japanese dialogue for a speaking office robot, which, unlike a factory robot, needs to adjust to a changing environment and exchange information with humans (see

A third semantics team develops constraint-based English grammars and dictionaries intended to be used across software applications, and another group is using a Java interface to provide dictionary content in formats more suitable for children and others with limited literacy skills. Assistant Professor Chris Manning of computer science and linguistics demonstrated the latter technique for a dictionary of Warlpiri, an oral Australian aboriginal language, which has a written version developed by linguistic scholars of the 1950s. "Most online dictionaries do very little to exploit the advantages of a computer," Manning said. Even worse, dictionary writers are language experts who confuse novice students with technical notations like "transitive verb" at the beginning of entries, he said. His dictionary interface allows Warlpiri children to use "fuzzy" spelling to find words and to explore relationships between words in color-coded diagrams. Antonyms are in one color, derived words in another, and dialect in another. There is also audio for pronunciations.

Smart houses, interactive workspaces

CSLI researchers in the Archimedes project (see demonstrated tools they are developing to allow people with disabilities to access computers by voice or eye movements. Multimodal access is also essential to the development of smart houses, Archimedes researchers said, because making products useful for the disabled usually means others also will find them more convenient. The group is working on a conceptual model for future smart houses, in which the infrastructure that controls the furnace and the sound system is standard, and occupants attach the appliances and tools they want. Such a model would stop the current computer industry practice of manipulating operating systems to make older products unusable and would reduce future smart house construction costs, they say.

Another project on interactive workspaces is led by computer science Professor Terry Winograd, who is collaborating with various Stanford working groups on "groupware." The idea, he said, is to integrate individual laptops and personal digital assistants with specialized group facilities such as the interactive mural located in the graphics lab of the Gates Building (see http://graphics.stanford.EDU/projects/iwork/). Medical teams analyzing CT scans or engineers working on construction projects would like to be able to cut and paste complex information from one device to another to aid group analysis and record keeping, Winograd said, but getting the devices to know "who is doing what and when" is a challenge. "It's so easy to show off neat gadgets, but if we care about learning more, we have to evaluate them," Winograd said. One reason social scientists work with computer scientists at CSLI, he added, is that "computer scientists are really good at evaluation, if [by evaluation] you mean 'How fast does it run?'"

Smarter dumb objects

Students from the center's persuasive technologies lab displayed simple "smart" objects they have built to test people's reactions to them. Jonathan Bruck equipped a newspaper recycling bin with a virtual tree that grows a few inches each time someone puts a newspaper in it. In research on other students, he found the tree encouraged recycling much more than another recycling bin that simply praised users for recycling an object, he said. The device, he conceded, is probably too expensive and vulnerable to tampering to be practical for public streets.

Jason Tester, working with funding from DaimlerChrysler, displayed an audio-based in-car entertainment device for bored commuters. The system presents National Public Radio news stories in the form of questions first and keeps track of the driver's score. It airs the correct answers in the form of news broadcasts. Research on learning suggests people would remember more content if engaged this way, Tester said. In a test of 20 users, he said, most liked it, while a few thought it was somewhat distracting to their driving.

Another device, displayed by recent graduate Jared Kopf, allowed men using some specially outfitted campus bathrooms to learn new, upbeat words during the 34 seconds, on average, that they normally spend staring at blank walls above urinals. The idea was to see if people enjoyed making use of their time in this way. Most liked it, Kopf said, but he isn't sure if Stanford students would be representative of the larger public. Such devices could be used in the future, he speculated, to fill in people's individual training needs.

Emotions, credibility on the web

Much of the discussion was about the Internet, and especially how emotions may affect business on the web.

Communication Professor Byron Reeves, who monitors the human nervous system's response to media content, spoke about how people's palms begin to sweat within six seconds of viewing visual portrayals of sex, "blood and guts" or money, whether on TV or the web. Probably the arousal is out of people's direct control because it is part of the species' evolutionary flight-or-fight responses, he said. Psychologists and advertisers "often equate arousal with attention, but there is a point where arousal is so high that it interferes with attention," he said. An investment website, for example, may need an animated face or voice that reassures investors calmly, he said, just as sales people in local brokerage offices reassure callers or visitors. Known for recommending animated on-screen characters to Microsoft and other companies, Reeves said his research shows the face or voice used on a website is "absolutely critical. You need a careful process of casting."

It is not just a matter of matching the screen-help service to the product, according to research by graduate students working with Reeves and his colleague, communication Associate Professor Clifford Nass. Graduate student Kwan Min Lee, for example, found that introverts prefer voices that are stereotyped as introverts, and extroverts like louder, faster talkers. The user's personality profile predicts his or her preference better than the actual sound of the user's own voice, she said.

Large companies probably have conducted private studies on the credibility of websites, but there is almost no published research, said B. J. Fogg, who directs the persuasive technologies lab of CSLI. His students have begun research on credibility but expect the parameters to change over time. "The web is the wild west now, with people just forming their opinions about credibility," he said.

In a pilot study, graduate student Nina Kim found that typographical errors strongly decreased the credibility of a website and that the inclusion of a real-world address for a company increased it. In the absence of other identifying information, a website with an advertisement was rated higher than one without. Links that didn't work hurt credibility while recently updated pages helped it. "Apparently people believe that recent information is good information," Fogg said.

More surprising perhaps, the researchers found that web users held sites about "critical information" to a higher standard than less serious sites. Typos in sites about breast cancer and tuberculosis were much more harmful to the users' perception of credibility than typos in sites about treating bloody noses or ingrown toenails.

CSLI regularly holds conferences to inform affiliates of research in progress, and some companies send researchers to the campus for collaborations. For more information, see


By Kathleen O'Toole

© Stanford University. All Rights Reserved. Stanford, CA 94305. (650) 723-2300. Terms of Use  |  Copyright Complaints