February 18, 2010
Stanford software is gaining the sophistication to comprehend what humans write
By David Orenstein
For people who despair that there is too much information online, Chris Manning has a response: Technology is not the problem. In fact, technology may understand what you're trying to say.
At the annual conference of the American Association for the Advancement of Science (AAAS) in San Diego, the Stanford associate professor of computer science and linguistics will talk about enabling computers to process human language well enough to use the information it conveys.
"The problem of the age is information overload," said Manning, who'll speak Friday, Feb. 19, at 4:10 p.m. in Room 2 of the San Diego Convention Center. "The fundamental challenge I'm going to talk about is how we can get computers to actually understand at least a reasonable amount of what they read."
As computers make more sense of what's online, they will deliver more relevant search results and will help summarize, structure and act on information that individuals care about, much like a personal assistant.
A smartphone email program that understands the difference between "We need the Q4 figures" and "We found the Q4 figures" could prove invaluable to a busy executive.
Computers also could help researchers extract key facts from a sea of articles to create and update databases. In fact, Manning already has developed software that mines biology research papers for basic data.
State of the art
Manning readily acknowledges that the field of natural language understanding has a long way to go to catch up with popular imagination.
"The state of the art is still highly incomplete," he says. "We're just not at the level of what we see in science fiction movies. But human language technology has been making enormous advances."
In his AAAS talk, Manning will describe work on three emerging technologies at Stanford's Natural Language Processing (NLP) Group.
Working with linguistics Associate Professor Dan Jurafsky, Manning has been developing a fundamental set of tools to help computers do what a pupil does in grammar school: Parse sentences. As with humans, computers begin to understand sentences by recognizing parts of speech and how the sentence is structured.
The underlying technology is a branch of artificial intelligence called probabilistic machine learning. Essentially, computers are programmed to read a large number of sentences and then analyze their structure and elements, compiling statistics about verbs and nouns, and keeping track of what the subject of the sentence is doing.
Based on those statistics, for instance, a computer might conclude that "horse" is likely to be the subject of a sentence and that "hay" is something that horses might eat. Technical demonstrations of Manning and Jurafsky's language parsing software are available on the NLP Group website.
Building on that level of understanding, Manning's group has created software to sort out ambiguities in language by taking whole sentences into account when deciding what each word means. For instance, "make up" can have at least three meanings: to reconcile after a spat, to concoct a story, or to apply cosmetics. The technical solution, called "joint inference," is to look for other words in the sentence that are statistically shown to be relevant. If the word "argument" is there, the computer will lean toward "to reconcile."
Finally, Manning will talk about a technology called robust textual inference, which can read a passage of text and determine whether a conclusion about it is supported. That reading comprehension task is important because it's similar to what people sometimes expect search engines to do; they'll type in a conclusion ("hotels with free Wi-Fi") and hope the engine leads them to text that supports it ("Free Wireless High-Speed Internet access in all rooms").
With tremendous volumes of information appearing online every day in social networks, Manning says the need to train computers to understand human language, rather than meticulously structured data, is only increasing. The next research frontier may therefore be getting a computer to understand "C U soon, QT."
David Orenstein is associate director of communications at the Stanford School of Engineering.