Print

Professor ferrets out mysteries of biology by giving computers ‘intelligence’

Koller’s work on Artificial Intelligence has earned her the first ACM-Infosys Foundation Award in the Computing Sciences

L.A. Cicero Daphne Koller

Daphne Koller’s favorite subjects are genes, proteins and metabolic pathways.

BY DAVID ORENSTEIN

As a computer scientist who endows machines with artificial intelligence, Daphne Koller might seem an unlikely person to draw inspiration from the late 19th-century naturalist John Muir. But one of his quotes describes her fascination with the world, she says, and explains why she wades so far and so deeply into the complex world of molecular biology:

"When we try to pick out anything by itself, we find that it is bound fast by a thousand invisible cords that cannot be broken, to everything in the universe."

While some would see the intricacy Muir observed as too immense to understand, Koller sees it primarily as an opportunity for "intelligent"—even insightful—software to enable unique contributions to knowledge. That opportunity is especially great in the rapidly exploding realm of quantitative biological research. Torrents of raw data are currently pouring forth from new experimental assays in genomics, proteomics and other molecular biology research.

"Biology is a science in flux because it has gone from being purely experimental on small scales to becoming an information science on a large scale," Koller says. "But we are currently seeing only the tip of the iceberg in terms of the biological insights that we can extract from these data."

Koller's favorite subjects are genes, proteins and metabolic pathways. Her methods are a fusion of logic and probabilistic (statistics-based) reasoning and machine learning that she has helped to pioneer. For that fundamental work on artificial intelligence, the 39-year-old researcher this week was named the first-ever recipient of the ACM-Infosys Foundation Award in the Computing Sciences. The Association for Computing Machinery and the Indian technology giant created the $150,000 prize last year to recognize young researchers "whose contemporary innovations are having a dramatic impact on the computing field." Koller, who in 2004 was also named a MacArthur Fellow, will formally receive the ACM-Infosys award in June.

"Professor Koller's work on combining 'relational' logic and probability is the most important of her many research contributions in Artificial Intelligence and Computer Science," reads the award citation. "It has transformed the way people handle uncertainty in large computer systems, such as heterogeneous databases, image understanding systems, biological and medical models, and natural language processing systems."

Filling in biological blanks

Computers are humanity's best tools for handling large volumes of information, but Koller sees their real value not just in their ability to crunch numbers but also in their potential to infer what's happening in these complex systems—to help people understand the dynamics and relationships that the data alone do not describe.

Among the bigger biological mysteries that Koller has sought to solve with software are the mechanisms by which individual organisms come to have the variations that make them unique. Any two individuals within the same species—whether they are people or yeast strains—still will have distinct sequences of DNA called "genotypes." But how those genetic blueprints are translated into their unique set of traits called "phenotypes"—such as height—is a very complex process involving a lot of interactions that are not known.

In late 2006, Koller and student Su-In Lee published a paper with Harvard geneticists in the Proceedings of the National Academy of Sciences, in which they unveiled software called "Geronemo" that was aimed at understanding how these individual genetic variations can perturb the interactions between the elements in the cells (genes and proteins) and ultimately lead to phenotypic changes. This software was able to identify novel ways in which differences in yeast individuals led to differences in regulating gene expression. The software took information about gene expression and gene regulation within the specimens' DNA and compared which regulatory mechanisms best predicted gene expression profiles. Ultimately, the software revealed that much of the individuality among the yeasts was determined by how they used proteins to unfold their DNA strands in different ways.

A year later in Genome Biology, Koller, student Haidong Wang and computer scientists and biologists from three other institutions unveiled another software package, called "InSite," that integrated protein and sequence data to infer exactly where on their surfaces two interacting proteins would bind. How these proteins hook up, especially when mutations warp them, may have a lot to do, biologists believe, with how certain diseases such as cancer develop.

Both of the papers, because they explain how individual genotypes affect cellular pathways and result in individual phenotypes, could someday lead to advances in personalized medicine and drug design, Koller says. Currently, many existing drugs are not usable because they are only effective on a subset of the population or cause severe side-effects for another subset.

"If we could identify those individuals that will respond well to a drug, we will have access to a whole new range of therapeutic treatments," Koller says.

Future ideas

Not all of Koller's research focuses on biology. A current thrust her group investigates is how to get computers not only to correctly identify objects in pictures but also to outline these objects and determine how they are configured. For example, her group has managed to create software that can find a giraffe in a picture, outline it precisely and then use the shape of the outline to determine with high accuracy whether the giraffe is leaning down to drink or standing upright. This may seem obvious to people, but it is a major feat for a computer.

There are a lot of applications of such sophisticated "machine vision," including enhancing image search and endowing mobile robots with a better understanding of their surroundings. But Koller also has a medical idea in mind. She plans to launch a new project in which she will train the software to analyze magnetic resonance scans of the brains of psychiatric patients. The software might prove useful in diagnosing autism and Alzheimer's disease by spotting characteristic abnormalities in the size and shape of the hippocampus.

Koller, together with neonatologist Dr. Anna Penn and student Suchi Saria, is also embarking on an effort at Stanford Hospital to apply artificial intelligence to data collected from the vital sign monitors of premature infants while they are in neonatal intensive care units. The question will be whether sophisticated analysis can identify the precursors of adverse outcomes and give doctors and nurses the opportunity for early intervention.

Koller's milieu, mission and methods are each a world apart from those of Muir, but they are rooted in the same appreciation of life's complexity and interdependence, something Koller likes to call the "web of influence." By exploring that web with artificial intelligence, she is adding her own richness to humanity's understanding of nature.

David Orenstein is the communications and public relations manager at the Stanford School of Engineering.