New professor sifts through mountains of data for clues about children’s diseases

Atul Butte

Atul Butte, MD, PhD, seeks out trends. But he's not interested in wafer-thin models sashaying down a catwalk. He's focused on even tinier stars: the molecules of protein, RNA and DNA that dictate all cellular functions. Hidden patterns and unexpected relationships between these mini-divas might shed light on how genetic missteps lead to such diseases as cancer and diabetes.

The problem that Butte zeroes in on is this: The sheer volume of information generated by attempts to analyze this complex molecular soup stymies most conventionally trained biologists. So Butte designs computer programs to sift through data from many different experiments of many different researchers, homing in on shared and possibly significant results that may provide the first clue to new diagnostic tests or novel therapies

"We're beyond just looking at the symptoms of a disease," said Butte, who recently joined Lucile Packard Children's Hospital and the School of Medicine as an assistant professor of pediatrics and of medicine. "Now we're getting at the mechanics. We're putting together large data sets in a novel way so we can get new hypotheses and findings from them."

In addition to his expertise in bioinformatics, Butte is also a pediatric endocrinologist. With a foot in each of two different worlds, he is a member of a rare breed of computationally savvy physicians equipped to straddle a growing divide between biological research and clinical practice.

"Biologists are inventing new high-throughput experiments that create lots of data," said Russ Altman, MD, PhD, professor of genetics, medicine, bioengineering and computer science. "For obvious reasons, we need sophisticated computer tools to analyze the results: We can't do it by hand and there's no off-the-shelf software that will work. We need the kind of innovations that people like Atul can provide." Altman helped recruit Butte from the Children's Hospital Informatics Program in Boston.

Although Butte first became interested in computers and medicine as a high school student in New Jersey, he fine-tuned his skills out of necessity as he was finishing a fellowship in pediatric endocrinology at Children's Hospital Boston. He needed a way to keep track of the stream of lab results and other data from his research into pediatric diabetes.

"I began writing programs to make my life in the hospital easier," he said, "but I wanted to get back to biology." His career path was permanently altered by the imminent completion of the Human Genome Project and a chance discussion with a colleague about the data-management challenge posed by the emerging gene-chip technology. He was captivated by the opportunities presented by the nascent field.

"Bioinformatics allows us to ask questions biologists often can't even think of," said Butte, "and then to develop the methods that allow us to answer that question."

Gene chip, or microarray, experiments allow researchers to compare the expression patterns of individual genes over time, or between diseased and healthy cells. Each experiment generates thousands of pieces of data about many or all of the genes in a cell. Biologists use the technology routinely, focusing only on the few bits of data pertinent to their particular research topic. But what to do with the "leftovers"?

"A gene chip the size of your fingernail can hold information about the expression patterns of 40,000 genes, and there are data from tens of thousands of gene chips in national repositories," said Butte. "Right now researchers are barely scratching the surface of the vast amounts of data they're generating. I'm trying to join together data from many different experiments and find common, unexpected trends."

For example, said Butte, there is currently no effective therapy for amyotrophic lateral sclerosis. But it's possible that grouping diseases into families on the basis of their gene-expression patterns, rather than their symptoms, may reveal an unsuspected relationship between ALS and a treatable disease. Surprise kinships like these may lead researchers down hidden paths to new therapies for the original disease.

"There are probably about 80 human diseases that have gene-chip data available," said Butte. "I want to find out how similar these diseases are to each other, and why that matters."

The same techniques can be used to identify the molecular pathways associated with disease development. Butte and his collaborators at the Joslin Diabetes Center in Boston recently published research using microarrays to identify genes important in the development of brown fat—a specialized type of energy-burning fat that may help prevent the obesity that can lead to insulin resistance and diabetes. The researchers designed computer programs to whittle down the many thousands of candidate genes to a handful of culprits. Although these genes are known to be involved in other diseases, it was the first time they'd been implicated in fat cell formation.

"Much of kids' health, as well as the health problems they may face in adulthood, is related to their genes," said Butte. "Unfortunately, there are few bioinformatics specialists who focus on pediatric diseases. We hope to change that here."