Stanford Report, May 12, 2003 |
||
|
Pattern-recognition method zeroes in on genes that regulate cell's genetic machinery BY DAVID HART Using a new technique for recognizing patterns in biological databases, a team of American and Israeli computer scientists and geneticists have developed a practical computational method that zeroes in on the genes responsible for controlling the genetic machinery of a cell. In a paper published online May 12 by Nature Genetics, the researchers from Stanford University and Hebrew University and the Weizmann Institute in Israel report that their method revealed several previously unknown control, or regulatory, genes from Saccharomyces cerevisiae, better known as baker's yeast. The work was supported in part by an Information Technology Research grant from the National Science Foundation, the independent agency that supports basic research in all fields of science and engineering. Daphne Koller, an associate professor of computer science at Stanford, is leading an effort to develop general models for recognizing meaningful patterns that span many related databases. This unique ability to "mix and match" biological data sources gives the new method its power. "All of the cells in our body contain exactly the same DNA, but the behavior of different cells can vary radically; the reason is that some genes are activated in some cells and dormant in others," Koller says. "Understanding the regulatory processes that cause genes to activate has important implications on understanding how cells function, and how diseases that involve breakdown in regulatory processes, such as cancer, can develop. This work provides the first high-throughput method that automatically extracts genetic regulatory circuits directly from biological data." Ordinarily, regulatory genes are identified experimentally, not computationally. The new computational method makes the experimental process much more efficient. It identifies regulatory candidates for testing in the lab and predicts how each regulator will affect cellular activity. The demonstration on the yeast genome data discovered several possible new regulatory genes and the clusters they regulate, and the team has already confirmed three of the predictions in the lab. The primary data source for the method is gene expression technology, which involves mixing probes for thousands of genes with a biological sample under specific conditions. The probes provide a detailed snapshot, called a microarray, of the genes active in those conditions. A typical experiment would produce microarrays for hundreds of different conditions to see which genes are expressed in each condition. "Each microarray provides a huge amount of data, and it's very difficult to extract meaningful information from it by eye," Koller said. "Over the past few years, many computational methods have been developed for dealing with this problem." Such methods identify related clusters of a handful, or several dozen, genes from the resulting data. The new approach described in Nature Genetics also finds clusters, but it is the first to incorporate data about known and putative regulatory genes and the first to simultaneously predict which gene or genes regulate each cluster. In response to internal or external signals, regulatory genes tell clusters of genes to turn on or off -- in other words, to start or stop making proteins. The proteins from each gene cluster, in turn, are responsible for a different cell process. These processes include converting sugar to energy, responding to stress, folding proteins and building cellular components such as the nucleus. Koller's pattern-recognition technique builds on statistical models and the widely used technology of relational databases to look for patterns across many different data sources, such as microarray data, DNA sequence data or protein-protein interaction data. The generality of the method lets the researchers assemble data sets like Lego blocks -- plug a new database into the relational structure and let the algorithm go to work. To make the results of this type of analysis more accessible to biologists, Koller's group has developed the GeneXPress visualization and exploration tool, freely available on the web (http://genexpress.stanford.edu/). "Knowing the control mechanism for gene clusters is crucial for understanding how cells respond to internal and external signals," said team member David Botstein, a professor of genetics at Stanford. "This new computational method efficiently generates targets for testing and proposes hypotheses about their regulatory roles that can be experimentally confirmed." Authors of the Nature Genetics paper also include Koller's graduate student Eran Segal; Hebrew University Professor Nir Friedman and his graduate student Dana Pe'er; Michael Shapira, a postdoctoral researcher in Botstein's group; and Aviv Regev of the Weizmann Institute, currently a fellow at Harvard University's Bauer Center for Genomics Research. In addition to NSF support, the team members received support from several institutional awards, the Colton Foundation and the Israeli Ministry of Science. David Hart is a public information officer at the National Science Foundation. Dawn Levy contributed to the reporting.
|
|
|