Stanford University

News Service



Mark Shwartz, News Service (650) 723-9296; e-mail:

New software streamlines the search for disease-causing genes

One of the greatest challenges in molecular medicine is identifying the genes that cause specific diseases often a painstaking process requiring months of laboratory trial and error.

Now researchers have designed a computer program with the potential for dramatically speeding up the search for disease-causing genes and hastening the discovery of new drugs to treat a wide range of genetic illnesses.

The new software called Digital Disease is described in a study in the June 8 issue of the journal Science.

"The beauty of our software is that it runs in under a second on a computer instead of taking months to years in a lab," says Jonathan Usuka, a graduate student in chemistry and co-author of the Science study.

Usuka developed Digital Disease in collaboration with colleagues at the Roche pharmaceutical company, where he has been working as a part-time consultant while completing his doctoral dissertation.

Of mice and men

"The idea behind Digital Disease is shockingly simple," Usuka says.

The software scans databases containing computerized maps of DNA molecules, then instantly locates irregularities in genes that might be responsible for cancer, diabetes and other ailments.

Instead of searching through maps of the human genome, Digital Disease scans the DNA of mice, which are genetically similar to people.

"Human genes and mouse genes are about 80 percent identical," Usuka points out, "so if you can identify a genetic mutation in mice, you can easily locate the same mutation in humans."

Genes are located on chromosomes, which are actually molecules of DNA. Each gene, in turn, is made up of thousands of chemical subunits called nucleotides strung together in a specific sequence.

Every nucleotide contains one of four chemicals known by the abbreviations A, T, C and G. The order in which the four nucleotides occur determines how a gene functions and its ultimate effect on the physical makeup of both people and mice - everything from hair (or fur) color to susceptibility to disease.

It turns out that human and mouse genomes contain roughly the same number of nucleotides 3.1 billion. Digital Disease scans the mouse genome for locations in the DNA where a single nucleotide has been altered from the norm. These locations are called single nucleotide polymorphisms or SNPs (pronounced "snips").

Here is an example. If one mouse has a gene with the nucleotide sequence- A-T-T-G, and another mouse carries the sequence C-T-T-G in the same location, then a SNP exists at the A / C position.

Scientists estimate that a SNP occurs about once every 1,000 nucleotides, which means that each person - and each mouse may carry some 3 million SNPs in their DNA. Most are harmless aberrations, but some have been linked to devastating genetic illnesses, such as breast cancer and sickle cell disease.

Harmful SNPs

Usuka says the primary application of Digital Disease is to hunt down SNPs that are potentially harmful to mice and therefore to people.

Statistically, a mouse with an inherited illness is likely to have SNPs that are different from disease-free mice. Digital Disease instantly recognizes those unique SNPs, allowing researchers to zero in on specific chromosome segments where disease-causing genes may be located.

For the Science study, Usuka and his colleagues wanted to test their software to see if it could predict where a disease-causing gene was located simply by comparing a SNP database with basic genetic information about a known disease.

The researchers constructed their own SNP database, which contained the location of about 3,400 SNPs spread across all 19 mouse chromosomes. They also extracted DNA from 10 strains of mice that carry genes known to produce 10 different traits and diseases - including lymphoma and asthma. Previous laboratory studies already had determined the chromosmal location of the genes responsible for all 10 traits.

Relying only on behavioral and disease data from the earlier studies, the software predicted the chromosome locations with remarkable accuracy.

For example, in lab mice with lymphoma, Digital Disease correctly predicted the three chromosomal regions known to carry lymphoma-causing genes. The software also pinpointed a fourth gene location, which may open up a new avenue of lymphoma research, according to Usuka.

Digital Disease also predicted the four chromosomal regions known to contain asthma-causing genes, along with two other potential asthma gene sites.

"We showed that we can predict computationally in a second what other studies took months or years to do," notes Usuka. "We believe that, with our software, you can reduce the search to about 10 percent of the mouse genome, which will save a huge amount of time."

Finding the precise location of disease-causing genes will require additional lab work, he points out, but the new software should give medical researchers a tremendous head start.

"Genetic experiments churn through hundreds of inbred lab mice, but our system minimizes that need. With only a handful of mice, you have an answer in a second, then you're off and running," Usuka says.

Frat boy mice

Besides diseases, the software also was successful in locating chromosomal regions containing genes that determine various physical traits, such as bone mineral density, eye weight and even alcohol preference what Usuka calls "frat boy mice" versus "teetotaler mice."

"With our software and the SNP database we are making public, if you know any physical or behavioral trait, you'll be able to find what part of the mouse genome that trait might come from," Usuka explains.

The new software even works in reverse.

"We can actually predict how mice look, how they'll behave, whether they like alcohol or whether they're susceptible to lymphoma and other diseases," he says. "A lot of geneticists don't believe this can be done, especially with only 3,000 SNPs, then they're amazed when they see the data."

Usuka maintains that Digital Disease will pave the way for the rapid discovery of new drugs, especially now that scientists have all but finished mapping the human and mouse genomes. He points to the SNP Consortium a non-profit organization underwritten by major pharmaceutical corporations with the goal of identifying all the major SNPs in the human genome.

"Everyone knows SNPs are important, but no one knows what to do with them," Usuka observes. "So far, the consortium has identified 1.42 million SNPs in humans. In less than a year we have identified 3,000 mouse SNPs and will soon be closer to 10,000. Imagine how accurate the software might be when that number grows to a million."

In addition to Usuka, the June 8 Science study was co-authored by Andrew Grupe, Dee Aud and Gary Peltz of Roche Bioscience; Soren Germer, Mandeep K. Ahluwalia and Russell Higuchi of Roche Molecular Systems; and John K. Belknap and Robert F. Klein of Oregon Health Sciences University and Portland Veterans Affairs Medical Center.

For a Web-based demonstration of the new SNP database, visit


By Mark Shwartz

© Stanford University. All Rights Reserved. Stanford, CA 94305. (650) 723-2300. Terms of Use  |  Copyright Complaints